AI Everywhere: How Deep Learning is augmenting the Gaming Experience
Learn how Deep Learning Super Sampling (DLSS), a technology developed by NVIDIA using Deep Learning, is revolutionising the entire gaming industry.
— By Kshitij Kumar, Computer Vision researcher @ Sally Robotics.
Artificial Intelligence has been gaining a lot of traction in recent years, with major industries enhancing their goods and services with its help. Voice assistants like Siri, Alexa, and Google Assistant can tell you the outcome of the football match you weren’t able to watch last night by simply asking them. Recommender systems in Spotify help you expand your music library, and those of Amazon’s can predict which product you would buy next without you knowing it first-hand! AI has also enabled advancements in the field of robotics, healthcare, finance, and several more. In this article, we’ll explore how NVIDIA is using AI to enhance our gaming experiences.
DLSS overview: what is it?
Games these days are becoming graphically and hence computationally intensive, with the recent advent of ray tracing in modern games which simulates real lighting conditions real-time instead of game developers adding pre-baked reflections in the game. This task is extremely computationally expensive and coupled with the fact that games are being at resolutions higher than the good-ol’ 1080p; hence there was a need to fasten up the number-crunching while rendering games.
Deep Learning Super Sampling (DLSS) is an NVIDIA RTX technology that uses the power of AI to boost your frame rates in games with graphically-intensive workloads. With DLSS, gamers can use higher resolutions and settings while still maintaining solid framerates.
DLSS specifically performs the task of super-resolution, wherein a lower resolution image, say 1080p, can be upscaled to a higher resolution, say 4K, with minimal loss in the image quality. The game is still being rendered at 1080p, and only scaling part is being handled by the DLSS algorithm, hence giving you higher framerates in games where otherwise would have to actually render the game in 4K (my laptop would melt if I ever try to do that 😌).
At its core, DLSS is a neural network which has been trained on the NVIDIA supercomputers where the output of the neural network is compared with a 16K ground truth image, and the error between these two images is backpropagated in the network. Since the inference speed of the neural network can cause a bottleneck, DLSS leverages the Tensor Cores which come in the RTX 2000 GPUs (and the upcoming RTX 3000 GPUs) and in the workstation RTX GPUs. These are specialised cores that help to accelerate tensor operations significantly, hence providing significant boosts in AI training and HPC-related tasks.
Evolution of DLSS: 1.0 vs 2.0
The DLSS 1.0 was trained on a game-by-game basis and was extremely time-consuming to train. It also didn’t support 4x upsampling (ex: 1080p to 4K) and had various other flaws in picture quality which wasn’t worth the improvement in the framerates. DLSS 2.0 is a more general algorithm, hence, removing the limitation on training, supports up to 4x upsampling, reducing the inference time by a significant factor by leveraging the tensor cores, with about 1.5 ms of inference time at 4K on an RTX 2080ti, and at some instances, provides better results than a natively rendered image! Let’s have a look at some images to compare the results.
DLSS 1.0 upscaled images rendered at 720p to a maximum of 1080p, whereas DLSS 2.0 performs upscaling from 540p to 1080p. As you can see, the 540p image looks like a blurry mess. On the other hand, the DLSS 2.0 result looks even better than the DLSS 1.0 result and even looks slightly better than the natively rendered image. This means that DLSS 2.0 is doing a better job of filling in pixels than DLSS 1.0, even though the former has fewer pixels to upsample from. This reflects how much improvement has been made between these two versions.
Since DLSS 2.0 is able to upscale from a 540p rendered image, this coupled with the low inference time is where the performance gains of DLSS is seen.
Diving deeper into how DLSS works
While rendering scene geometry in games (say a triangle), the number of pixels you use (roughly speaking, the sampling rate) will decide how the image will look.
By using a 4x4 sampling grid to render a triangle, we can see that the image doesn’t look that great.
Now, on quadrupling the sampling rate by using an 8x8 sampling grid, we can see that the image of the triangle rendered looks somewhat more similar to the intended triangle. Here is where lies the problem statement of DLSS: the aim is to take in a low-resolution sampling rate image and reconstruct it to a higher resolution.
In the end, you would expect that at the same rendering cost of that of a lower quality image, you would get a higher resolution image.
This is in effect solving the problem of super-resolution, which is not new. Let’s have a brief overview of the previous work done in super-resolution.
Single Image Super-Resolution
This aims to create higher-res images with the help of lower-res images with the help of interpolation techniques such as bilinear, bicubic, and Lanczos interpolation filters. This can also be done with the help of deep neural networks, but the problem with them is that they hallucinate new pixels based on the priors embedded in the training set, which may result in pictures that may seem plausible, but they won’t be near to the native rendered images, which is what we’re trying to target. They also lack details as compared to native high-res images and are temporally unstable, which results in inconsistencies and flickering in images over time. Hence, this single-frame approach is not plausible.
Let’s compare a few results of some single image super-res techniques with what DLSS 2.0 offers.
Clearly, the DLSS 2.0 result better than the bicubic interpolation method and the one that of ESRGAN, a neural network architecture that uses a General Adversarial Network (GAN) to perform super-resolution. The foliage in the result generated by DLSS 2.0 looks even more detailed than even the natively rendered image.
This technique leverages the use of multiple low-res images to output a higher-res image. This technique helps restore the optical details better than the single-frame approach. A lot of previous work on this has been done has been designed for videos and burst-mode photography, and this doesn’t leverage the rendering specific information. One example can be the use of optical flow to align multiple frames stead of using geometric motion vectors, where the latter is cheaper to compute and can help us generate more accurate results. The multi-frame approach is a promising lead, and this brings us to our next technique.
Spatial-Temporal Super Sampling
This type of sampling used multiple frames to perform supersampling of images. Let’s say that we have our current frame. We can assume that the previous frame looks somewhat similar to our current frame. By using a relatively lower sampling rate, we can increase the total number of samples we require in order to reconstruct the image.
The problem is with the assumption that the previous frame looks similar to the current because, in games, things are constantly moving, and that is what makes them interesting. To solve this, Spatial-Temporal Super Sampling techniques use heuristics such as Neighbourhood Clamping in order to do history rectification of frames. These heuristics are handcrafted and have a tradeoff between blurriness, temporal instability, moire pattern vs. lagging, and ghosting.
DLSS 2.0: Deep Learning-based multi-frame reconstruction
The DLSS neural network is trained on tens of thousands of training images that are trained to perform the reconstruction better than the handcrafted heuristics, thus, eliminating the tradeoff in using them. This results in much higher quality reconstruction using samples from multiple frames.
“Control” is a game that supports ray tracing and DLSS. As you can see, the RTX 2060 got a performance boost from 8 fps to an average of 36.8 fps, making the game playable. This boost in performance beats even the RTX 2080ti without DLSS on, hence showing how capable this technology is.
Digital Foundry compared the image quality between DLSS 1.9 and 2.0 in this video which you can check out.
Control with DLSS 1.9 on the left and DLSS 2.0 on the right. Images were rendered on an RTX 2060 at 1080p and were upsampled to 4K.
In the above image, we can see how DLSS 2.0 is able to render the hair strands which were previously absent when using DLSS 1.9.
DLSS is available only for RTX 2000 and Turing-based GPUs (and on the upcoming Ampere-based RTX 3000 GPUs) and supports a few gaming titles as of now, but with the better-than-native-resolution image quality of DLSS 2.0 coupled with the gain in framerates one is able to achieve with minimal loss in video quality, the future of DLSS is looking bright and this is truly a remarkable achievement for NVIDIA in the space of gaming.
For the medium Article - Click Me