Waifu2x-Caffe

Waifu2x-caffe is a 3rd-party software built for Window machines that utilizes an NVIDIA GPU and is based off of Waifu2x. For those wondering what Caffe in Waifu2x-caffe means, Caffe is the deep learning framework and so Waifu2x-caffe is just the Caffe version of the original Waifu2x software. What sets Waifu2x-caffe from other 3rd-party softwares is that it utilizes CUDA(available with only NVIDIA GPUs) and cuDNN for faster and higher quality upscaling. With the new releases of the RTX 3000s, the amount of CUDA cores have increased by a lot since the last generation and it is especially exciting to see much faster processing speeds will be. Though many of the models that can utilize cuDNN to improve upscaling qualities, it takes a lot longer compared to other Waifu2x-caffe is one of the most popular 3rd-party software with 5.3k stars on their github repository.

Waifu2x-Caffe's GUI and CUI

waifu2x-caffe_gui

Waifu2x-caffe supports both a graphical user interface(GUI) and a character user interface(CUI). Nagadomi the developer of Waifu2x created a post where Nagadomi mapped various features on the demo application site to Waifu2x-Caffe's GUI and I recommeded anyone thats familiar with Waifu2x's demo application site to check it out here. For a more detailed instruction on Waifu2x-Caffe's installation process or general information in regards to using Waifu2x-Caffe can be found on Waifu2x-Caffe's github respository page.


Waifu2x-Caffe's GUI Overview

Waifu2x-Caffe can support a single input, whether it's just one image or an entire folder with multiple images. It supports various file extensions as seen in the image, but it does not support video files. Despite how video files are just made up of a whole lot of images, Waifu2x-Caffe does not support video files, though Video2x, 3rd-party software does support video files which we'll talk about later. If you really want to use Waifu2x-Caffe on videos, then I suggest finding someway to extract all image frames from the video into one folder. There are many ways to extract image frames from a video, but the best way would be through using FFmpeg. FFmpeg is a very powerful open source software that can handle/process videos, audios, and other multimedia files. It is an extremely powerful and useful tool to learn and use, though the downside would be that it can be quite difficult to learn at first.

Conversion Image Quality & Processing Settings

The conversion mode allows users to choose whether they want to upscale the image, denoise the image, or both. The most important thing to note here is the denoise level. Denoising should only be used when there is noise or compression involved. Level 0 denoise should be used when the image has only gone through one compression, while level 1 is for those images that have gone through a couple compressions, level 2 for those that have gone through many compressions such as those memes that keep getting reposted resulting in a degraded and god-awful image quality, and level 3 should only be used to get rid of compression artifacts. Denoising can result in smoother lines or even losing the small details, so it's important to ensure you choose the best denoising levels for your needs. When in doubt just test each option and compare your results.

Model and TTA Mode

There are a variety of models used by Waifu2x-Caffe and more information for each model can be found on their github repository. A translation for the various models include:

  • * 2D illustration (RGB model): Model for 2D illustration that converts all RGB of the image
  • * Photo / animation (Photo model): Model for photo / animation
  • * 2D illustration (UpRGB model): A model that converts faster than 2D illustration (RGB model) with the same or better image quality. However, it consumes more memory (VRAM) than the RGB model, so adjust the split size if you want to force termination during conversion.
  • * Photo / Animation (UpPhoto model): Faster and equivalent than Photo / Animation (Photo model) A model that converts with the above image quality. However, it consumes more memory (VRAM) than the Photo model, so adjust the split size if you want to force termination during conversion.
  • * 2D illustration (Y model): Model for 2D illustration that converts only the brightness of the image
  • * 2D illustration (UpResNet10 model): A model that converts with higher image quality than 2D illustration (UpRGB model). Please note that the output result of this model will change if the division size is different.
  • * 2D illustration (CUnet model): A model that can convert 2D illustrations with the highest image quality among the included models. Please note that the output result of this model will change if the division size is different.

Test time augmentation(TTA) is a way of testing datasets by flipping it multiple times and getting the average for a more accurate prediction.

tta_infographic
TTA Infographic

TTA mode in Waifu2x basically involves flipping the image 8 times, then upscale it and finally average all 8 images in order to have an even higher quality upscaled image. TTA mode can also remove serveral times of artfiacts, stated by Nagadomi who is the creator of Waifu2x and a contributor to Waifu2x-caffe. It should be noted that TTA mode takes 8x as long for the entire upscaling process to finish. In my personal experience I have not been able to find any noticeable difference between an imaged upscaled in experience I have not been able to find any noticeable difference between an image upscaled in TTA mode and one not in TTA mode.

download

Source Image

download1

Level 1 noise, 1x scale, and without TTA mode

download2

Level 1 noise, 1x scale, and with TTA mode

Other Settings

Split size and batch size are for managing the amount of images in batches that Waifu2x-caffe will process at once. So the bigger the split and batch size the faster the upscaling processes will take. The problem with having a bigger split and batch size is the amount of memory it'll use. If the processor is set to CUDA, then it'll be important to consider VRAM as this setting will use a lot of it.
Lastly there are the general settings, check cuDNN, and UI language setting. The UI language setting has 12 selectable languages. The app settings allows various miscellaneous settings and the most important being able to choose either CPU or CUDA processor. The check cuDNN setting allows the users to verify if the CUDA processor can be used.

Only Upscale Examples

origin
Source Image
2x Upscale
x2 Upscale and TTA

Only Denoising Examples

origin
Source Image
Denoise Level 0
Denoise Level 1
Denoise Level 2
Denoise Level 3

Upscale & Denoising Examples

origin
Source Image
2x Upscale and Denoise Level 0
2x Upscale and Denoise Level 1
2x Upscale and Denoise Level 2
2x Upscale and Denoise Level 3