Diffusion
Deep study of diffusion process mathematics — more efficient sampling algorithms and noise scheduling strategies. Reducing generation time from infinity to milliseconds.
About Me · 关于
Finding order within noise, and resonance across modalities.
I am Gary, a PhD at USTC focusing on diffusion models and multimodal large models.
My research explores the boundaries of generative AI — turning noise into art and enabling text, images, video, and 3D to flow freely across modalities. I believe AI is an extension of creativity.
我是 Gary,一名专注于 扩散模型 和 多模态大模型 研究的 PhD。致力于探索生成式 AI 的边界,将噪声转化为艺术,让文本、图像、视频、3D 在不同模态间自由流转。
// Research Directions · 研究领域
Deep study of diffusion process mathematics — more efficient sampling algorithms and noise scheduling strategies. Reducing generation time from infinity to milliseconds.
Constructing a unified representation space enabling free modality conversion in latent space — true cross-modal understanding and generation.
Through compression, quantization, and distillation — bringing large models to edge devices, making AI more ubiquitous and practical.
Studying the theoretical foundations of machine learning, including model generalization, optimization dynamics, representation learning, and the mathematical principles behind modern generative models.
Papers · 论文发表
Challenges the conventional view that initial noise in diffusion generation is merely random — reveals analyzable semantic patterns and proposes a training-free Semantic Erasure-Injection process.
Introduces StyleWallfacer — a unified framework for high-quality style transfer addressing semantic drift, overfitting, and color limitation through triple diffusion and semantic style injection.
Proposes SHSRD — an efficient conditional diffusion framework for hyperspectral image superresolution using spectral information injection and two-stage transfer learning on small HSI datasets.
Studies how a source diffusion model can be reused for domain perception and adapted through fine-tuning, aiming to improve few-shot image generation under limited target-domain data.
Explores dynamic knowledge for kernel modulation in one-shot multi-domain adaptation, targeting image generation when only extremely limited domain-specific supervision is available.
Formulates few-shot domain adaptation with dual objectives, jointly considering image restoration quality and cross-domain alignment to improve transfer under scarce target-domain examples.
Contact Me · 建立连接