Your Pre-trained Diffusion Model Secretly Knows Restoration

Johns Hopkins University

Our approach enables pre-trained FLUX and WAN models to achieve strong restoration performance by learning only conditioning prompts.

Image Restoration (FLUX + Ours)

Mixed (Snow and Haze)
CDD restored
CDD input
Input
Restored
Real Rain
LHP restored
LHP input
Input
Restored
Unseen (under-display camera)
TOLED restored
TOLED input
Input
Restored
Real Blur
4KRD restored
4KRD input
Input
Restored

Video Restoration (WAN + ours)

Real Snow
Input
Restored
Real Low-light
Input
Restored
Real Rain
Input
Restored
Real Haze
Input
Restored

Abstract

Pre-trained diffusion models have enabled significant advancements in All-in-One Restoration (AiOR), offering improved perceptual quality and generalization. However, diffusion-based restoration methods primarily rely on fine-tuning or Control-Net style modules to leverage the pre-trained diffusion model's priors for AiOR. In this work, we show that these pre-trained diffusion models inherently possess restoration behavior, which can be unlocked by directly learning prompt embeddings at the output of the text encoder. Interestingly, this behavior is largely inaccessible through text prompts and text-token embedding optimization. Furthermore, we observe that naive prompt learning is unstable because the forward noising process using degraded images is misaligned with the reverse sampling trajectory. To resolve this, we train prompts within a diffusion bridge formulation that aligns training and inference dynamics, enforcing a coherent denoising path from noisy degraded states to clean images. Building on these insights, we introduce our lightweight learned prompts on the pre-trained WAN video model and FLUX image models, converting them into high-performing restoration models. Extensive experiments demonstrate that our approach achieves competitive performance and generalization across diverse degradations, while avoiding fine-tuning and restoration-specific control modules.

Motivation

Motivation figure

Even with optimized token prompts (textual inversion/prompt tuning), the model tends to denoise without removing degradations, whereas embedding-space optimization enables restoration from the same noisy degraded input.

Proposed Approach

Block figure

(a) We freeze the diffusion backbone and optimize only the conditioning: token-space prompts fail, while embedding-space optimization elicits restoration. (b) Naive tuning yields states anchored at the degraded latent, while DDBM is pinned at both endpoints; our desired bridge starts from noisy degraded inputs and denoises toward the clean latent. (c) Naive training and inference see different state families, causing trajectory misalignment. (d) Bridge-based training aligns train/test states; DDBM may under-correct early, while the desired bridge enables stronger correction along an aligned path.

Quantitative Comparisons

Quantitative comparisons of our prompt learning approach on the FLUX model with state-of-the-art AiOR approaches for images from OOD, mixed and unseen degradations.

Table 1

Quantitative comparisons of our prompt learning approach on the WAN model with state-of-the-art image and video restoration approaches for the task of all-in-one video restoration on OOD datasets.

Table 2

Qualitative Results

Qual 1

Qualitative comparisons of the pre-trained FLUX model using our learned prompts with state-of-the-art AiOR approaches. Our approach enables the pre-trained FLUX to achieve remarkable restoration performance.

Qual 2

Qualitative comparisons of the pre-trained WAN model using our learned prompts with state-of-the-art AiOR approaches. Our prompts elicit the strong restoration potential of the pre-trained WAN model.

BibTeX


          @misc{rajagopalan2026pretraineddiffusionmodelsecretly,
              title={Your Pre-trained Diffusion Model Secretly Knows Restoration},
              author={Sudarshan Rajagopalan and Vishal M. Patel},
              year={2026},
              eprint={2604.04924},
              archivePrefix={arXiv},
              primaryClass={cs.CV},
              url={https://arxiv.org/abs/2604.04924},
        }
        
>
Acknowledgement: The website template is taken from Nerfies