Gary

View My Papers Contact Me

Exploring across domains

Diffusion MLLMs Efficiency

About Me · 关于

Portrait Gary

Finding order within noise, and resonance across modalities.

7 Featured Papers

4 Research Directions

I am Gary, a PhD at USTC focusing on diffusion models and multimodal large models.

My research explores the boundaries of generative AI — turning noise into art and enabling text, images, video, and 3D to flow freely across modalities. I believe AI is an extension of creativity.

我是 Gary，一名专注于 扩散模型 和 多模态大模型 研究的 PhD。致力于探索生成式 AI 的边界，将噪声转化为艺术，让文本、图像、视频、3D 在不同模态间自由流转。

Supervisors

Prof. Zhengjun Zha 查正军教授 / Prof. Min Li 李敏教授

Advisor

Wei Zhai 翟伟

Affiliation

Department of Automation, University of Science and Technology of China 中国科学技术大学自动化系

Collaboration

Research collaborations with Li Auto and Tencent 与理想汽车、腾讯开展合作

Diffusion Models

Multimodal Large Language Models

Noise Optimization

Text-to-Content Generation

Style Transfer

Hyperspectral Imaging

Efficient Inference

AI Art

// Research Directions · 研究领域

Research
directions

Generation Sampling

Diffusion

Deep study of diffusion process mathematics — more efficient sampling algorithms and noise scheduling strategies. Reducing generation time from infinity to milliseconds.

Text Image Video 3D

MLLMs

Constructing a unified representation space enabling free modality conversion in latent space — true cross-modal understanding and generation.

Compression Distillation

Efficiency

Through compression, quantization, and distillation — bringing large models to edge devices, making AI more ubiquitous and practical.

Theory Optimization

ML Theory

Studying the theoretical foundations of machine learning, including model generalization, optimization dynamics, representation learning, and the mathematical principles behind modern generative models.

Papers · 论文发表

Selected
Papers

Discuss collaboration

01 —

Beyond Randomness: Understand the Order of the Noise in Diffusion

2025.11 · Gary, et al. · Under Review

Challenges the conventional view that initial noise in diffusion generation is merely random — reveals analyzable semantic patterns and proposes a training-free Semantic Erasure-Injection process.

Diffusion Noise Optimization Training-free

02 —

Break Stylistic Sophon: Are We Really Meant to Confine the Imagination in Style Transfer?

2025.05 · Gary, et al. · Under Review

Introduces StyleWallfacer — a unified framework for high-quality style transfer addressing semantic drift, overfitting, and color limitation through triple diffusion and semantic style injection.

AI Art Style Transfer Diffusion

03 —

SHSRD: Efficient Conditional Diffusion Model for Single Hyperspectral Image Superresolution

2025.03 · Gary, et al. · JSTARS 2025

Proposes SHSRD — an efficient conditional diffusion framework for hyperspectral image superresolution using spectral information injection and two-stage transfer learning on small HSI datasets.

Diffusion HSI Low-Level Vision

04 —

Reusing Source Diffusion Model for Domain Perception: Towards Few-shot Image Generation via Fine-tuning

2025 · Gary, et al. · Expert Systems with Applications, 130797

Studies how a source diffusion model can be reused for domain perception and adapted through fine-tuning, aiming to improve few-shot image generation under limited target-domain data.

Diffusion Few-shot Generation Domain Adaptation

05 —

Using Dynamic Knowledge for Kernel Modulation: Towards Image Generation via One-shot Multi-domain Adaptation

2025 · Gary, et al. · Pattern Recognition, 112489

Explores dynamic knowledge for kernel modulation in one-shot multi-domain adaptation, targeting image generation when only extremely limited domain-specific supervision is available.

Image Generation One-shot Adaptation Kernel Modulation

06 —

Dual Objectives in Few-Shot Domain Adaptation: Image Restoration and Cross-Domain Alignment

2025 · Gary, et al. · Expert Systems with Applications, 130759

Formulates few-shot domain adaptation with dual objectives, jointly considering image restoration quality and cross-domain alignment to improve transfer under scarce target-domain examples.

Few-shot Adaptation Image Restoration Cross-domain Alignment

07 —

Breaking the Synthetic-Real Domain Shortcut for Training-Free Generative Replay-based Class Incremental Learning

2026.05 · Gary, et al. · ICML 2026 Regular

Proposes DREAM, a training-free generative replay framework for class-incremental learning. It addresses the synthetic-real domain shortcut by using subspace rectification, orthogonal projection, and real-anchored prototype regularization to improve exemplar-free continual learning.

ICML 2026 Class-Incremental Learning Generative Replay

Contact Me · 建立连接

Gary_144@mail
ustc.edu.cn

Email 邮箱

Gary_144@mail.ustc.edu.cn

RedNote 小红书

Gary

Gary

Researchdirections

Diffusion

MLLMs

Efficiency

ML Theory

SelectedPapers

Beyond Randomness: Understand the Order of the Noise in Diffusion

Break Stylistic Sophon: Are We Really Meant to Confine the Imagination in Style Transfer?

SHSRD: Efficient Conditional Diffusion Model for Single Hyperspectral Image Superresolution

Reusing Source Diffusion Model for Domain Perception: Towards Few-shot Image Generation via Fine-tuning

Using Dynamic Knowledge for Kernel Modulation: Towards Image Generation via One-shot Multi-domain Adaptation

Dual Objectives in Few-Shot Domain Adaptation: Image Restoration and Cross-Domain Alignment

Breaking the Synthetic-Real Domain Shortcut for Training-Free Generative Replay-based Class Incremental Learning

Gary_144@mailustc.edu.cn

Research
directions

Selected
Papers

Gary_144@mail
ustc.edu.cn