OUR-GAN: One-shot Ultra-high-Resolution Generative Adversarial Networks - Demo

Abstract

We propose a one-shot ultra-high-resolution (UHR) image synthesis framework, OUR-GAN, that generates non-repetitive 16K (16,384 x 8,644) images from a single training image and is trainable on a single GPU. OUR-GAN generates an initial image that is visually consistent and varied in shape at low resolution, then gradually increases the resolution by adding detail through super-resolution. Since OUR-GAN learns from a real UHR image, it can synthesize large-scale shapes with fine details while maintaining long-range coherence, which is difficult with conventional generative models that rely on the patch distribution learned from relatively small images. OUR-GAN can synthesize high-quality 4K images with only 8GB of GPU memory and 16K images with 12.5GB, as it synthesizes a UHR image part by part through seamless subregion-wise super-resolution, preventing discontinuity at the subregion boundary. Additionally, OUR-GAN improves visual coherence while maintaining diversity by applying vertical positional convolution. In experiments on the ST4K and RAISE datasets, OUR-GAN exhibited improved fidelity, visual coherency, and diversity compared with the baseline one-shot synthesis models. To the best of our knowledge, OUR-GAN is the first one-shot image synthesizer that generates non-repetitive UHR images on a single GPU.

Notice

Loading UHR images may take time because the files are large.
Therefore, we’ve posted downsampled versions of the images on this page for faster image loading.
Click on the images to access the full-size raw images.

The images may look distorted depending on the viewer since the image resolution is very high.
So, please download samples, then evaluate the quality.
Download all samples (including all sections) - link

ST4K

To train and evaluate OUR-GAN, we built a new UHR image dataset, Scenery and Texture-4K (ST4K), consisting of high-quality 4K scenery and texture images.
The ST4K dataset includes a total of 50 copyright-free images collected from the Internet with a minimum resolution of 4,096 × 2,160 pixels.
Download ST4K - link

1. 16K (16,384 x 10,912) image synthesized by OUR-GAN trained with a single 4K training image.

OUR-GAN can synthesize UHR image with higher resolution than that of the training image.
The resolution of this image is 16K, whereas that of the training image is only 4K.
OUR-GAN synthesize high-fidelity UHR images, preserving even fine details.
Download Sec 1. samples - link


16K (16,384 x 10,912) image synthesized by OUR-GAN


4K (4,096 x 2,728) training image


8K (8,192 x 5,456) image synthesized by OUR-GAN

2. Improving visual coherence

For one-shot image synthesis, achieving visual coherence while maintaining diversity is challenging.
HP-VAE-GAN[1] synthesizes diverse images but fails to catch global coherence, as shown below.
OUR-GAN, applied vertical coordinate convolution to HP-VAE-GAN, significantly improves the global coherence of patterns still generating diverse patterns.
Download Sec 2. samples - link



HP-VAE-GAN		OUR-GAN (proposed) [HP-VAE-GAN + Vertical coordinate convolution]

3. Large-scale shape generation

For UHR image synthesis, models that learns from small patch images like InfinityGAN[2] are hard to synthesize large-scale shapes.
But, OUR-GAN can synthesize globally coherent large-scale objects such as buildings.
You can download full-size InfinityGAN samples in the InfinityGAN project page.
Download Sec 3. samples - link


OUR-GAN (proposed)


InfinityGAN

4. 16K (16,384 x 10,912) image synthesized by OUR-GAN trained with a single 16K training image.

OUR-GAN-16K can synthesize 16K images from a single 16K training image with a single GPU.
We further increased the resolution of OUR-GAN by applying an additional subregion-wise super-resolution step, referred to as OUR-GAN-16K.
OUR-GAN-16K successfully synthesized non-repetitive high-fidelity 16K images maintaining both visual coherence and fine details.
% The size of the biggest island at the rigth of the image is approximately 7,083 × 4,388.
Download Sec 4. samples - island, forest


16K (16,384 x 9,152) island image synthesized by OUR-GAN


16K (16,329 x 9,185) island training image


16K (16,384 x 10,880) forest image synthesized by OUR-GAN


16K (16,384 x 10,880) forest training image

References

[1] Shir Gur, Sagie Benaim, and Lior Wolf. Hierarchical patch vae-gan: Generating diverse videos from a single sample. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 16761–16772. Curran Associates, Inc., 2020.

[2] Chieh Hubert Lin, Yen-Chi Cheng, Hsin-Ying Lee, Sergey Tulyakov, and Ming-Hsuan Yang. InfinityGAN: Towards infinite-pixel image synthesis. In International Conference on Learning Representations, 2022.