Exploring across domains
Diffusion MLLMs Efficiency

About Me · 关于

Finding order within noise, and resonance across modalities.

6Featured Papers
4Research Directions

I am Gary, a PhD at USTC focusing on diffusion models and multimodal large models.

My research explores the boundaries of generative AI — turning noise into art and enabling text, images, video, and 3D to flow freely across modalities. I believe AI is an extension of creativity.

我是 Gary,一名专注于 扩散模型多模态大模型 研究的 PhD。致力于探索生成式 AI 的边界,将噪声转化为艺术,让文本、图像、视频、3D 在不同模态间自由流转。

Diffusion Models
Multimodal Large Models
Noise Optimization
Text-to-Content Generation
Style Transfer
Hyperspectral Imaging
Efficient Inference
AI Art

// Research Directions · 研究领域

Research
directions

GenerationSampling

Diffusion

Deep study of diffusion process mathematics — more efficient sampling algorithms and noise scheduling strategies. Reducing generation time from infinity to milliseconds.

TextImageVideo3D

MLLMs

Constructing a unified representation space enabling free modality conversion in latent space — true cross-modal understanding and generation.

CompressionDistillation

Efficiency

Through compression, quantization, and distillation — bringing large models to edge devices, making AI more ubiquitous and practical.

CreativityAesthetics

Machine Learning Theory

Studying the theoretical foundations of machine learning, including model generalization, optimization dynamics, representation learning, and the mathematical principles behind modern generative models.

Selected
Papers

Discuss collaboration
01 —

Beyond Randomness: Understand the Order of the Noise in Diffusion

2025.11 · Gary, et al. · Under Review

Challenges the conventional view that initial noise in diffusion generation is merely random — reveals analyzable semantic patterns and proposes a training-free Semantic Erasure-Injection process.

DiffusionNoise OptimizationTraining-free
02 —

Break Stylistic Sophon: Are We Really Meant to Confine the Imagination in Style Transfer?

2025.05 · Gary, et al. · Under Review

Introduces StyleWallfacer — a unified framework for high-quality style transfer addressing semantic drift, overfitting, and color limitation through triple diffusion and semantic style injection.

AI ArtStyle TransferDiffusion
03 —

SHSRD: Efficient Conditional Diffusion Model for Single Hyperspectral Image Superresolution

2025.03 · Gary, et al. · JSTARS 2025

Proposes SHSRD — an efficient conditional diffusion framework for hyperspectral image superresolution using spectral information injection and two-stage transfer learning on small HSI datasets.

DiffusionHSILow-Level Vision
04 —

Reusing Source Diffusion Model for Domain Perception: Towards Few-shot Image Generation via Fine-tuning

2025 · Gary, et al. · Expert Systems with Applications, 130797

Studies how a source diffusion model can be reused for domain perception and adapted through fine-tuning, aiming to improve few-shot image generation under limited target-domain data.

DiffusionFew-shot GenerationDomain Adaptation
05 —

Using Dynamic Knowledge for Kernel Modulation: Towards Image Generation via One-shot Multi-domain Adaptation

2025 · Gary, et al. · Pattern Recognition, 112489

Explores dynamic knowledge for kernel modulation in one-shot multi-domain adaptation, targeting image generation when only extremely limited domain-specific supervision is available.

Image GenerationOne-shot AdaptationKernel Modulation
06 —

Dual Objectives in Few-Shot Domain Adaptation: Image Restoration and Cross-Domain Alignment

2025 · Gary, et al. · Expert Systems with Applications, 130759

Formulates few-shot domain adaptation with dual objectives, jointly considering image restoration quality and cross-domain alignment to improve transfer under scarce target-domain examples.

Few-shot AdaptationImage RestorationCross-domain Alignment

Contact Me · 建立连接

Gary_144@mail
ustc.edu.cn

Email 邮箱
Gary_144@mail.ustc.edu.cn
GitHub / Homepage
songyan888.github.io
Google Scholar
Publications profile
RedNote 小红书
Gary