Logo FocusDPO: Dynamic Preference Opti- mization for Multi-Subject Personalized Image Generation via Adaptive Focus

Qiaoqiao Jin1*, Siming Fu1*†, Dong She1*, Weinan Jia1,3, Hualiang Wang1,2, Mu Liu1, Jidong Jiang1‡
1 ByteDance
2 The Hong Kong University of Science and Technology
3 University of Science and Technology of China

* Equal Contribution     Project Lead     Corresponding Author

We introduce FocusDPO, a post-training framework that adaptively identifies focus regions based on dynamic semantic correspondence and supervision image complexity. FocusDPO demonstrates capabilities in single-subject and multi-subject driven generation tasks.

Abstract

Multi-subject personalized image generation aims to synthesize customized images containing multiple specified subjects without requiring test-time optimization. However, achieving fine-grained independent control over multiple subjects remains challenging due to difficulties in preserving subject fidelity and preventing cross-subject attribute leakage. We present FocusDPO, a framework that adaptively identifies focus regions based on dynamic semantic correspondence and supervision image complexity. During training, our method progressively adjusts these focal areas across noise timesteps, implementing a weighted strategy that rewards information-rich patches while penalizing regions with low prediction confidence. The framework dynamically adjusts focus allocation during the DPO process according to the semantic complexity of reference images and establishes robust correspondence mappings between generated and reference subjects. Extensive experiments demonstrate that our method substantially enhances the performance of existing pre-trained personalized generation models, achieving state-of-the-art results on both single-subject and multi-subject personalized image synthesis benchmarks. Our method effectively mitigates attribute leakage while preserving superior subject fidelity across diverse generation scenarios, advancing the frontier of controllable multi-subject image synthesis.

How does FocusDPO work?

FocusDPO introduce spatially-aware optimization framework (left) that adaptively focuses on critical regions through dynamic semantic guidance, leveraging (a) Structure-Preserving Attention Field, which establishes robust correspondence mappings between generated and reference subjects, and (b) Detail-Preserving Complexity Estimator, which provides the semantic complexity of images.

More Results

Comparison with State-of-the-Art Methods

Disrupted-Instance Pair Dataset (DIP)

We construct high-quality subject-consistent pairs with controlled subject variation, utilizing a binary prior guidance to identify regions containing subject differences.

BibTeX

BibTex Code Here