Jimenez’s MLAA port to OpenSceneGraph

Posted: July 16, 2011 in Computer Graphics

Intro

Full scene antialiasing is being kind of a trending topic these days of inexpensive big flat displays and powerful GPUs.

Traditional AA algorithms used to rely on some sort of supersampling (i.e: rendering the scene to an n times bigger buffer and then mapping and averaging more than one supersampled pixel to a single final pixel).

Multisampling AA is the most widespread technique. 4x-8x MSAA can yield good results but can also be computationally expensive.

Morphological Antialiasing is a fairly recent technique which has grown in popularity in the recent years. In 2009, Alexander Reshethov (Intel) proposed an algorithm to detect shape patterns in aliased edges and then blending the pixels pertaining to an edge with their 4 neigborhood based on the sub-pixel area covered by the mathematical edge line.

Reshethov’s implementation wasn’t practical on GPU since it was tightly coupled to the CPU, but the concept had a lot of potential. His demo takes a .ppm image as an input and then outputs an antialiased .ppm as an output.

However, there’s been a lot of activity on this topic since then and a few GPU-accelerated techniques have been presented.

Jimenez’s MLAA

Among them, Jorge Jimenez and Diego Gutierrez’s team at the University of Zaragoza (Spain) have developed a symmetrical 3-pass post-processing technique named Jimenez’s MLAA.

According to the tests conducted by the authors, It can achieve visual results between MSAA 4x and 8x with an average speedup of 11.8x ! (GeForce9800GTX+). On the counterpart it suffers from classic MLAA problems such as handling of sub-pixel features but you can tweak some parameters to get really good results with virtually non noticeable glitches at a fraction of the time and memory that MSAA takes!

The algorithm, in a nutshell, works as follows:

In the first step a luma-based discontinuity test is performed on the RTT’ed scene for the current pixel and its 4-neighborhood. The result is encoded in an RGBA edges texture.

One can easily notice that it produces artifacts in zones that are not necessarily edges. The threshold can be tweaked, but converting RGB to the luma space has its issues when two completely different colors map to similar luma values.

The second step takes the edges texture and with the help of a precomputed area texture determines for each edgel (pixel belonging to an edge) the area above and below the mathematical edge crossing the pixel. This areas are encoded into another RGBA texture and used as blending weights. Here a specially smart use of the hardware bilinear filtering is made by sampling inbetween two texels to fetch two values in one single access.

In the last step the original aliased image and the blending weights texture are used to do the actual blending and generate the final image.

Here’s the original aliased image (taken from NoLimits)

All of the screenshots here are lossless PNGs, so go ahead and zoom in 😀

Translation into GLSL

You can download the source code for the original demo here.

There’s a DX9 and a DX10 version. The shaders were obviously written in HLSL. Everything contained into a single .fx file.

So in order to make it work in OpenSceneGraph I had to first translate it into 3 GLSL fragment shaders and 1 vertex shader. It needs GLSL 1.3 at least to work.

Integration into OpenSceneGraph

OSG doesn’t have a programmable post-fx pipeline itself. Instead, there’s a third party library named OSGPPU which allows you to set up a graph made up of PPUs (Post Processing Units). Each one of which have an associated shader program, one or more input textures (inherently the one from the previous step), and an output texture which can be plugged to the next step and so on.

The construction of the postFX pipeline for JMLAA was painless, however there is a detail that I haven’t still been able to figure out: correct stencil buffer usage.

An optimization which may yield a big performance boost is the usage of the stencil buffer as a processing mask. When creating the edges texture in the first step you also write an 1 to the previously fully zeroed stencil buffer in its corresponding location. The pixels that don’t satisfy the condition of being part of an edge are (discard;)ed. In the subsequent steps the values of the stencil are used as a mask, so pixels not belonging to edges are quickly discarded in the graphics pipeline.

But for some reason, OSGPPU either doesn’t clear the stencil properly or updates it prematurely, so I couldn’t get this working and had to process every pixel in all three steps without discarding everything. But even though so, I noticed no performance hit when loading fairly complex models. Here’s the thread where I asked for help.

Results

I wrote a little demo app which disables the default OSG’s MSAA, loads up a 3D model (it supports a few different formats) and displays it on a viewer. You can view the intermediate (edges and weights) textures, as well as the original and antialiased final images. By default it uses a depth-based discontinuity test instead of the luma one.

This is the original aliased image (zoomed in by 16x):

And this one is the filtered final image produced by JMLAA:

You will find more details on JMLAA in the book GPU Pro 2 !

Download

You can download here  a VS 2008 project along with the source, a default model, the shaders and the precompiled binaries for OSG/OSGPPU. It should compile and run out-of-the-box.

Leave a comment