Skip to search formSkip to main contentSkip to account menu
DOI:10.1007/978-3-031-19787-1_6 - Corpus ID: 245006056
@inproceedings{Huang2021MultimodalCI, title={Multimodal Conditional Image Synthesis with Product-of-Experts GANs}, author={Xun Huang and Arun Mallya and Ting-Chun Wang and Ming-Yu Liu}, booktitle={European Conference on Computer Vision}, year={2021}, url={https://api.semanticscholar.org/CorpusID:245006056}}
- Xun Huang, Arun Mallya, Ming-Yu Liu
- Published in European Conference on… 9 December 2021
- Computer Science
The Product-of-Experts Generative Adversarial Networks (PoE-GAN) framework is proposed, which can synthesize images conditioned on multiple input modalities or any subset of them, even the empty set, to advance the state of the art in multimodal conditional image synthesis.
79 Citations
8
44
24
2
Topics
PoE-GAN (opens in a new tab)Multimodal Conditional Image Synthesis (opens in a new tab)Conditional Image Synthesis (opens in a new tab)Multimodal (opens in a new tab)User Input (opens in a new tab)Product Of Experts (opens in a new tab)State Of The Art (opens in a new tab)Generative Adversarial Networks (opens in a new tab)
79 Citations
- Marlene CareilStéphane LathuilièreC. CouprieJakob Verbeek
- 2022
Computer Science
ECCV Workshops
This work proposes OCO-GAN, for Optionally COnditioned GAN, which addresses both tasks in a unified manner, with a shared image synthesis network that can be conditioned either on semantic maps or directly on latents.
- Tariq BerradaJakob VerbeekC. CouprieAlahari Karteek
- 2023
Computer Science
ArXiv
This work proposes a new class of GAN discriminators for semantic image synthesis that generates highly realistic images by exploiting feature backbone networks pre-trained for tasks such as image classification and introduces a new generator architecture with better context modeling and using cross-attention to inject noise into latent variables, leading to more diverse generated images.
- Lei ZhouTaotao Zhang
- 2023
Computer Science
Journal of Electronic Imaging
The attention mechanism is used to fuse the multi-scale features of the content and style images, and the currently learned style image is distinguished from other style images by contrastive learning modules, thereby enhancing the ability of the network to learn style images.
- 1
- Jinsheng ZhengDaqing Liu Dacheng Tao
- 2024
Computer Science
International Journal of Computer Vision
A Mixture-of-Modality-Tokens Transformer (MMoT) that adaptively fuses fine-grained multimodal control signals, a multi-modality balanced training loss to stabilize the optimization of each modality, and a multimodale sampling guidance to balance the strength of eachmodality control signal is introduced.
- 1
- Highly Influenced[PDF]
- Nithin Gopalakrishnan NairW. G. C. BandaraVishal M. Patel
- 2022
Computer Science
ArXiv
This work proposes a solution based on a denoising diffusion probabilistic models to synthesise images under multi-model priors that does not require explicit retraining for all modalities and can leverage the outputs of individual modalities to generate realistic images according to different constraints.
- 7
- Highly Influenced[PDF]
- Tengfei WangTing Zhang Fang Wen
- 2022
Computer Science
ArXiv
Examination of the proposed pretraining-based image-to-image translation (PITI) is shown to be capable of synthesizing images of unprecedented realism and faithfulness, and an adversarial training to enhance the texture synthesis in the diffusion model training is proposed.
- 142 [PDF]
- Fangneng ZhanYingchen YuRongliang WuJiahui ZhangShijian Lu
- 2023
Computer Science
IEEE Transactions on Pattern Analysis and Machine…
This survey comprehensively contextualize the advance of the recent multimodal image synthesis and editing and formulate taxonomies according to data modalities and model types and provides insights about the current research challenges and possible directions for future research.
- 23 [PDF]
- Lei ZhouTaotao Zhang
- 2023
Computer Science
ICCAI
This paper combines multi-scale attention with contrastive learning methods to propose an efficient model called AttCST, which combines the current features extracted by the pre-trained deep convolutional network with the previous shallow features to supplement the texture information lost by the in-depth features.
- Nithin Gopalakrishnan NairW. G. C. BandaraVishal M. Patel
- 2023
Computer Science
2023 IEEE/CVF Conference on Computer Vision and…
This paper shows that there exists a closed-form solution for generating an image given various constraints in the DDPM, and introduces a novel reliability parameter that allows using different off-the-shelf diffusion models trained across various datasets during sampling time alone to guide it to the desired outcome satisfying multiple constraints.
- 8
- PDF
- A. BrownCheng-Yang FuOmkar M. ParkhiTamara L. BergA. Vedaldi
- 2022
Computer Science
ECCV
This work proposes a self-supervised approach that simulates edits by augmenting off-the-shelf images in a target domain by learning a conditional probability distribution of the edits, end-to-end, and shows that different blending effects can be learned by an intuitive control of the augmentation process.
...
...
81 References
- Ting-Chun WangMing-Yu LiuJun-Yan ZhuAndrew TaoJ. KautzBryan Catanzaro
- 2018
Computer Science
2018 IEEE/CVF Conference on Computer Vision and…
A new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs) is presented, which significantly outperforms existing methods, advancing both the quality and the resolution of deep image synthesis and editing.
- 3,585 [PDF]
- Han ZhangTao Xu Dimitris N. Metaxas
- 2017
Computer Science
2017 IEEE International Conference on Computer…
This paper proposes Stacked Generative Adversarial Networks (StackGAN) to generate 256 photo-realistic images conditioned on text descriptions and introduces a novel Conditioning Augmentation technique that encourages smoothness in the latent conditioning manifold.
- 2,554 [PDF]
- V. SushkoEdgar SchönfeldDan ZhangJuergen GallB. SchieleA. Khoreva
- 2021
Computer Science
ICLR
This work proposes a novel, simplified GAN model, which needs only adversarial supervision to achieve high quality results, and re-designs the discriminator as a semantic segmentation network, directly using the given semantic label maps as the ground truth for training.
- 157 [PDF]
- Han ZhangJing Yu KohJason BaldridgeHonglak LeeYinfei Yang
- 2021
Computer Science
2021 IEEE/CVF Conference on Computer Vision and…
The Cross-Modal Contrastive Generative Adversarial Network (XMC-GAN) addresses the challenge of text-to-image synthesis systems by maximizing the mutual information between image and text via multiple contrastive losses which capture inter- modality and intra-modality correspondences.
- 304
- Highly Influential[PDF]
- Ming TaoH. TangSongsong WuN. SebeFei WuXiaoyuan Jing
- 2020
Computer Science
ArXiv
A novel simplified text-to-image backbone which is able to synthesize high-quality images directly by one pair of generator and discriminator, a novel regularization method called Matching-Aware zero-centered Gradient Penalty and a novel fusion module which can exploit the semantics of text descriptions effectively and fuse text and image features deeply during the generation process.
- 195
- Ming-Yu LiuXun HuangJiahui YuTing-Chun WangArun Mallya
- 2021
Computer Science
Proceedings of the IEEE
An overview of GANs with a special focus on algorithms and applications for visual synthesis is provided and several important techniques to stabilize GAN training are covered, which has a reputation for being notoriously difficult.
- 124
- Highly Influential[PDF]
- Andrew BrockJeff DonahueK. Simonyan
- 2019
Computer Science
ICLR
It is found that applying orthogonal regularization to the generator renders it amenable to a simple "truncation trick," allowing fine control over the trade-off between sample fidelity and variety by reducing the variance of the Generator's input.
- 4,751 [PDF]
- Xihui LiuGuojun YinJing ShaoXiaogang WangHongsheng Li
- 2019
Computer Science
NeurIPS
This work argues that convolutional kernels in the generator should be aware of the distinct semantic labels at different locations when generating images, and proposes a feature pyramid semantics-embedding discriminator, which is more effective in enhancing fine details and semantic alignments between the generated images and the input semantic layouts.
- 190
- PDF
- Minfeng ZhuPingbo PanWei ChenYi Yang
- 2019
Computer Science
2019 IEEE/CVF Conference on Computer Vision and…
The proposed DM-GAN model introduces a dynamic memory module to refine fuzzy image contents, when the initial images are not well generated, and performs favorably against the state-of-the-art approaches.
- 505 [PDF]
- Ruilin LiuYixiao GeChing Lam ChoiXiaogang WangHongsheng Li
- 2021
Computer Science
2021 IEEE/CVF Conference on Computer Vision and…
This paper proposes a novel DivCo framework to properly constrain both "positive" and "negative" relations between the generated images specified in the latent space and introduces a novel latent-augmented contrastive loss, which encourages images generated from adjacent latent codes to be similar and those generated from distinct latent code to be dissimilar.
- 73 [PDF]
...
...
Related Papers
Showing 1 through 3 of 0 Related Papers