[PDF] Multimodal Conditional Image Synthesis with Product-of-Experts GANs | Semantic Scholar (2024)

Skip to search formSkip to main contentSkip to account menu

Semantic ScholarSemantic Scholar's Logo
@inproceedings{Huang2021MultimodalCI, title={Multimodal Conditional Image Synthesis with Product-of-Experts GANs}, author={Xun Huang and Arun Mallya and Ting-Chun Wang and Ming-Yu Liu}, booktitle={European Conference on Computer Vision}, year={2021}, url={https://api.semanticscholar.org/CorpusID:245006056}}
  • Xun Huang, Arun Mallya, Ming-Yu Liu
  • Published in European Conference on… 9 December 2021
  • Computer Science

The Product-of-Experts Generative Adversarial Networks (PoE-GAN) framework is proposed, which can synthesize images conditioned on multiple input modalities or any subset of them, even the empty set, to advance the state of the art in multimodal conditional image synthesis.

79 Citations

Highly Influential Citations

8

Background Citations

44

Methods Citations

24

Results Citations

2

Topics

PoE-GAN (opens in a new tab)Multimodal Conditional Image Synthesis (opens in a new tab)Conditional Image Synthesis (opens in a new tab)Multimodal (opens in a new tab)User Input (opens in a new tab)Product Of Experts (opens in a new tab)State Of The Art (opens in a new tab)Generative Adversarial Networks (opens in a new tab)

79 Citations

Unifying conditional and unconditional semantic image synthesis with OCO-GAN
    Marlene CareilStéphane LathuilièreC. CouprieJakob Verbeek

    Computer Science

    ECCV Workshops

  • 2022

This work proposes OCO-GAN, for Optionally COnditioned GAN, which addresses both tasks in a unified manner, with a shared image synthesis network that can be conditioned either on semantic maps or directly on latents.

Unlocking Pre-trained Image Backbones for Semantic Image Synthesis
    Tariq BerradaJakob VerbeekC. CouprieAlahari Karteek

    Computer Science

    ArXiv

  • 2023

This work proposes a new class of GAN discriminators for semantic image synthesis that generates highly realistic images by exploiting feature backbone networks pre-trained for tasks such as image classification and introduces a new generator architecture with better context modeling and using cross-attention to inject noise into latent variables, leading to more diverse generated images.

AttCST: attention improves style transfer via contrastive learning
    Lei ZhouTaotao Zhang

    Computer Science

    Journal of Electronic Imaging

  • 2023

The attention mechanism is used to fuse the multi-scale features of the content and style images, and the currently learned style image is distinguished from other style images by contrastive learning modules, thereby enhancing the ability of the network to learn style images.

  • 1
MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal Conditional Image Synthesis
    Jinsheng ZhengDaqing Liu Dacheng Tao

    Computer Science

    International Journal of Computer Vision

  • 2024

A Mixture-of-Modality-Tokens Transformer (MMoT) that adaptively fuses fine-grained multimodal control signals, a multi-modality balanced training loss to stabilize the optimization of each modality, and a multimodale sampling guidance to balance the strength of eachmodality control signal is introduced.

Image Generation with Multimodal Priors using Denoising Diffusion Probabilistic Models
    Nithin Gopalakrishnan NairW. G. C. BandaraVishal M. Patel

    Computer Science

    ArXiv

  • 2022

This work proposes a solution based on a denoising diffusion probabilistic models to synthesise images under multi-model priors that does not require explicit retraining for all modalities and can leverage the outputs of individual modalities to generate realistic images according to different constraints.

Pretraining is All You Need for Image-to-Image Translation
    Tengfei WangTing Zhang Fang Wen

    Computer Science

    ArXiv

  • 2022

Examination of the proposed pretraining-based image-to-image translation (PITI) is shown to be capable of synthesizing images of unprecedented realism and faithfulness, and an adversarial training to enhance the texture synthesis in the diffusion model training is proposed.

Multimodal Image Synthesis and Editing: The Generative AI Era
    Fangneng ZhanYingchen YuRongliang WuJiahui ZhangShijian Lu

    Computer Science

    IEEE Transactions on Pattern Analysis and Machine…

  • 2023

This survey comprehensively contextualize the advance of the recent multimodal image synthesis and editing and formulate taxonomies according to data modalities and model types and provides insights about the current research challenges and possible directions for future research.

Multi-scale Attention Enhancement for Arbitrary Style Transfer via Contrast Learning
    Lei ZhouTaotao Zhang

    Computer Science

    ICCAI

  • 2023

This paper combines multi-scale attention with contrastive learning methods to propose an efficient model called AttCST, which combines the current features extracted by the pre-trained deep convolutional network with the previous shallow features to supplement the texture information lost by the in-depth features.

    Nithin Gopalakrishnan NairW. G. C. BandaraVishal M. Patel

    Computer Science

    2023 IEEE/CVF Conference on Computer Vision and…

  • 2023

This paper shows that there exists a closed-form solution for generating an image given various constraints in the DDPM, and introduces a novel reliability parameter that allows using different off-the-shelf diffusion models trained across various datasets during sampling time alone to guide it to the desired outcome satisfying multiple constraints.

  • 8
  • PDF
End-to-End Visual Editing with a Generatively Pre-Trained Artist
    A. BrownCheng-Yang FuOmkar M. ParkhiTamara L. BergA. Vedaldi

    Computer Science

    ECCV

  • 2022

This work proposes a self-supervised approach that simulates edits by augmenting off-the-shelf images in a target domain by learning a conditional probability distribution of the edits, end-to-end, and shows that different blending effects can be learned by an intuitive control of the augmentation process.

...

...

81 References

High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
    Ting-Chun WangMing-Yu LiuJun-Yan ZhuAndrew TaoJ. KautzBryan Catanzaro

    Computer Science

    2018 IEEE/CVF Conference on Computer Vision and…

  • 2018

A new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs) is presented, which significantly outperforms existing methods, advancing both the quality and the resolution of deep image synthesis and editing.

StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks
    Han ZhangTao Xu Dimitris N. Metaxas

    Computer Science

    2017 IEEE International Conference on Computer…

  • 2017

This paper proposes Stacked Generative Adversarial Networks (StackGAN) to generate 256 photo-realistic images conditioned on text descriptions and introduces a novel Conditioning Augmentation technique that encourages smoothness in the latent conditioning manifold.

You Only Need Adversarial Supervision for Semantic Image Synthesis
    V. SushkoEdgar SchönfeldDan ZhangJuergen GallB. SchieleA. Khoreva

    Computer Science

    ICLR

  • 2021

This work proposes a novel, simplified GAN model, which needs only adversarial supervision to achieve high quality results, and re-designs the discriminator as a semantic segmentation network, directly using the given semantic label maps as the ground truth for training.

Cross-Modal Contrastive Learning for Text-to-Image Generation
    Han ZhangJing Yu KohJason BaldridgeHonglak LeeYinfei Yang

    Computer Science

    2021 IEEE/CVF Conference on Computer Vision and…

  • 2021

The Cross-Modal Contrastive Generative Adversarial Network (XMC-GAN) addresses the challenge of text-to-image synthesis systems by maximizing the mutual information between image and text via multiple contrastive losses which capture inter- modality and intra-modality correspondences.

  • 304
  • Highly Influential
  • [PDF]
DF-GAN: Deep Fusion Generative Adversarial Networks for Text-to-Image Synthesis
    Ming TaoH. TangSongsong WuN. SebeFei WuXiaoyuan Jing

    Computer Science

    ArXiv

  • 2020

A novel simplified text-to-image backbone which is able to synthesize high-quality images directly by one pair of generator and discriminator, a novel regularization method called Matching-Aware zero-centered Gradient Penalty and a novel fusion module which can exploit the semantics of text descriptions effectively and fuse text and image features deeply during the generation process.

  • 195
Generative Adversarial Networks for Image and Video Synthesis: Algorithms and Applications
    Ming-Yu LiuXun HuangJiahui YuTing-Chun WangArun Mallya

    Computer Science

    Proceedings of the IEEE

  • 2021

An overview of GANs with a special focus on algorithms and applications for visual synthesis is provided and several important techniques to stabilize GAN training are covered, which has a reputation for being notoriously difficult.

  • 124
  • Highly Influential
  • [PDF]
Large Scale GAN Training for High Fidelity Natural Image Synthesis
    Andrew BrockJeff DonahueK. Simonyan

    Computer Science

    ICLR

  • 2019

It is found that applying orthogonal regularization to the generator renders it amenable to a simple "truncation trick," allowing fine control over the trade-off between sample fidelity and variety by reducing the variance of the Generator's input.

Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis
    Xihui LiuGuojun YinJing ShaoXiaogang WangHongsheng Li

    Computer Science

    NeurIPS

  • 2019

This work argues that convolutional kernels in the generator should be aware of the distinct semantic labels at different locations when generating images, and proposes a feature pyramid semantics-embedding discriminator, which is more effective in enhancing fine details and semantic alignments between the generated images and the input semantic layouts.

  • 190
  • PDF
DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-To-Image Synthesis
    Minfeng ZhuPingbo PanWei ChenYi Yang

    Computer Science

    2019 IEEE/CVF Conference on Computer Vision and…

  • 2019

The proposed DM-GAN model introduces a dynamic memory module to refine fuzzy image contents, when the initial images are not well generated, and performs favorably against the state-of-the-art approaches.

DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network
    Ruilin LiuYixiao GeChing Lam ChoiXiaogang WangHongsheng Li

    Computer Science

    2021 IEEE/CVF Conference on Computer Vision and…

  • 2021

This paper proposes a novel DivCo framework to properly constrain both "positive" and "negative" relations between the generated images specified in the latent space and introduces a novel latent-augmented contrastive loss, which encourages images generated from adjacent latent codes to be similar and those generated from distinct latent code to be dissimilar.

...

...

Related Papers

Showing 1 through 3 of 0 Related Papers

    [PDF] Multimodal Conditional Image Synthesis with Product-of-Experts GANs | Semantic Scholar (2024)
    Top Articles
    3.5: Derivatives of Exponential and Hyperbolic Functions
    13364 Nw 42Nd Street
    Po Box 7250 Sioux Falls Sd
    The Largest Banks - ​​How to Transfer Money With Only Card Number and CVV (2024)
    Avonlea Havanese
    Craigslist Kennewick Pasco Richland
    Kent And Pelczar Obituaries
    Chuckwagon racing 101: why it's OK to ask what a wheeler is | CBC News
    What is international trade and explain its types?
    House Share: What we learned living with strangers
    Otr Cross Reference
    What Is A Good Estimate For 380 Of 60
    10 Free Employee Handbook Templates in Word & ClickUp
    Mineral Wells Independent School District
    Minecraft Jar Google Drive
    The Cure Average Setlist
    Tnt Forum Activeboard
    "Une héroïne" : les funérailles de Rebecca Cheptegei, athlète olympique immolée par son compagnon | TF1 INFO
    Hellraiser III [1996] [R] - 5.8.6 | Parents' Guide & Review | Kids-In-Mind.com
    Boston Gang Map
    Dumb Money, la recensione: Paul Dano e quel film biografico sul caso GameStop
    Ukc Message Board
    Jeffers Funeral Home Obituaries Greeneville Tennessee
    Www.paystubportal.com/7-11 Login
    8000 Cranberry Springs Drive Suite 2M600
    Ihub Fnma Message Board
    Breckiehill Shower Cucumber
    Wood Chipper Rental Menards
    Criterion Dryer Review
    Bleacher Report Philadelphia Flyers
    Watertown Ford Quick Lane
    Account Now Login In
    Ticket To Paradise Showtimes Near Cinemark Mall Del Norte
    Gen 50 Kjv
    Buhl Park Summer Concert Series 2023 Schedule
    UAE 2023 F&B Data Insights: Restaurant Population and Traffic Data
    Askhistorians Book List
    Craigslistodessa
    Kiddie Jungle Parma
    Craigslist Greencastle
    Craigslist Mexicali Cars And Trucks - By Owner
    13 Fun & Best Things to Do in Hurricane, Utah
    Pain Out Maxx Kratom
    6576771660
    Here's Everything You Need to Know About Baby Ariel
    Lady Nagant Funko Pop
    Crigslist Tucson
    Devotion Showtimes Near Showplace Icon At Valley Fair
    552 Bus Schedule To Atlantic City
    Diccionario De Los Sueños Misabueso
    Sam's Club Fountain Valley Gas Prices
    Latest Posts
    Article information

    Author: Sen. Emmett Berge

    Last Updated:

    Views: 6345

    Rating: 5 / 5 (80 voted)

    Reviews: 95% of readers found this page helpful

    Author information

    Name: Sen. Emmett Berge

    Birthday: 1993-06-17

    Address: 787 Elvis Divide, Port Brice, OH 24507-6802

    Phone: +9779049645255

    Job: Senior Healthcare Specialist

    Hobby: Cycling, Model building, Kitesurfing, Origami, Lapidary, Dance, Basketball

    Introduction: My name is Sen. Emmett Berge, I am a funny, vast, charming, courageous, enthusiastic, jolly, famous person who loves writing and wants to share my knowledge and understanding with you.