Archive for the ‘Portfolio’ Category


Today is a great day. The version of Spelunky for the PlayStation®3 and the PlayStation®Vita we’ve been working on at BlitWorks has been released both in America and Europe.

Spelunky is a 2D platformer with randomized levels where endless combinations of crazy stuff can happen everytime you play, thus redefining the word ‘addictive’.

The game came out last year for the XBox360 achieving great success. Earlier this month a slightly more featured version for Steam and GOG was released, but today it makes its debut in Sony’s desktop and (for the first time) handheld consoles.

What sets this port apart from the others is:

– Cross-buy & Cross-play: Buy the PS3 version and get the Vita one for free (and vice-versa). Play in one console and your progress will be automatically sync’d in the cloud letting you continue in another console where you left.

– Wireless co-op mode: Up to 3 Vitas can be hooked up via Wi-Fi with a PS3 to play in co-op (either adventure or deathmatch). Or use your Vita to play against other vitas on the go via Ad-Hoc.

– Every Vita owns its own camera, so there’s no need for everyone in the game to be constrained to the same frame like in other versions.

– Touch features and accelerometer 3D effects on menus! (Vita only)

– Controller vibration feature in PS3

There’s a free demo awaiting for you to try at PlayStation Store.

The full version is priced at $14.99 in America and 14.99 € in Europe. If you’re a PS Plus subscriber you’ll get a 20% discount.

The port is getting fairly good reviews so far 🙂



A few years ago I was playing rythm guitar in a small band along with three of my friends. One day, our lead guitarist spotted how useful a looper pedal could be to us, but he was unsure if the purchase was worth the price.

For those of you that are not familiar with this kind of gear, a looper pedal is a device that is daisy-chained between an electric guitar and the amp, recorder, or the rest of your gear. It’s able to record what you play and at some point play it back in the background while you play new stuff. It’s extremely useful where more guitars than available are needed. Usually it can be controlled by means of some footswitch, hence the ‘pedal’ denomination.

I remember that I was shocked of seeing looper pedals starting at $130. Soon, my engineering-obsessive mind started to think of ways of building a cheaper device without compromising quality.


I made a wishlist for a decent looper pedal features:

  • It should be able to start recording when I tap a switch.
  • It should be able to stop recording and play the loop back when I want.
  • It should allow me to add up more “layers” of sound on top of the existing loop.
  • I should be able to stop it whenever I want.
  • It must feature a decent sound quality.
  • I must stay under $50

Given this list of requirements, I decided to work on a desing, and build a prototype.


Okay, two things were clear… I was going to need quite a big deal of RAM to store the audio, and a decent CODEC to DAC/ADC it.

Real-time audio streaming can become a relatively bandwidth-demanding task, so the RAM needed to be both fast and big enough.

In order to control de pedal I was going to need at least two buttons, but that was the smaller of my concerns.

Tools of trade

For this prototype I was lucky to have an Altera DE1 training board I bought to play with FPGAs.


The board features an Altera Cyclone 2 FPGA, an 8-MByte SDRAM Chip and a WM8731 high-quality, 24-bit, sigma-delta audio encoder/decoder with minijack inputs/outputs.

The RAM is 16-bit wide, and the CODEC is 24-bit, so to achieve CD-Quality (44,100 Hz) the formula would be like:

44,100 Hz x 3 bytes = 132300 bytes per second, or 129 KBps

Then, is the RAM capable of handling that?

With SDRAMs you can R/W in a mode named Burst where you give the starting address and the number of words to read/write and then the words are read/written sequentially with no more overhead. Let’s do some calculations:

What is the minimum frequency needed to stream 24-bit audio@44,1KHz to/from the SDRAM?

A naive calculation would be:

  • 1 word = 16 bits = 2 bytes
  • We need to stream in 24-bit units, that would be 1.5 words, so let’s round it up to 2 words
  • That makes 176.4 Kbps or 88.2 Kwords per second
  • Let’s double it because the looper might be recording new data at the same time it’s playing the old one.
  • Ideally, a theoretical bandwidth of 176.4KHz would suffice, but in practice there is an overhead due to CAS/RAS strobe, the memory switching a page, etc…

This particular chip can run up to 133 Mhz, which yields far more than needed bandwidth. One problem solved 🙂

At this point it looked like everything boiled down to hooking up the SDRAM, the CODEC and some sort of footswitch.


I came up with the following design:

The board has 2 XTALs, one running at 50MHz and another one running at 27 MHz.

– The SDRAM needs 100MHz, so one of the built-in PLLs in the FPGA was used to double the frequency of the 50MHz XTAL up to 100MHz.

– The audio CODEC streams data at 18.4MHz, so another PLL was used to reduce the 27MHz clock frequency down to 18.4.

  • The rest of the cores run at 50MHz

Let’s examine the rest of the modules I wrote in VHDL:

Reset delay

This module makes sure that every other module has been properly reset and that the PLLs are locked before the whole system starts operating.

Its inputs are just a signal from one of the on-board buttons that acts as a RESET button, the 50MHz clock and the ‘locked’ signals from the PLLs.

When the external reset is just deasserted, or the system is just powered on AND the PLLs are locked, it uses a register to count 3 million cycles of the 50MHz clock and then deasserts the internal reset signal used by the rest of the modules.

CODEC controller

It’s responsible for streaming data to and from the CODEC in I2S format. Its interface with the other modules are a couple of 16-bit busses for input/output streaming data, reset and a 18.4 Mhz clock from the Audio PLL.

I2C controller

The CODEC has a lot of configuration parameters (like gain, volume, input selection, audio format…) that you must configure using I2C@20KHz. This controller operates pretty much on its own and its only task is to send a sequence of predefined commands to configure the CODEC just after RESET. It’s written in Verilog and I grabbed it from the Altera demo cores that came with the board.

MFC (Memory Flow Controller)

This is one of the most complex modules of my looper.

It interfaces with the SDRAM through two FIFOs and is responsible of feeding the looper core with streaming data stored in the SDRAM and to write to the SDRAM data streamed from the looper core. Its interface with the other modules are a couple of 16-bit busses for I/O and a control bus for controlling and signaling these conditions:

  • Input fifo full (tristate)
  • Output fifo empty (tristate)
  • EOL (End Of Loop) (Core → MFC)
  • Busy (MFC → Core)
  • Start writer (MFC → Core)
  • Write enable (MFC → Core)
  • Read enable (MFC → Core)
  • Fifo clock (data in the busses is sync’d to this clock)
  • Reset and clock as usual

The core uses 3 23-bit registers to point to relevant memory addresses inside of the SDRAM:

  • Read address
  • Write address
  • End of loop pointer

The behavior is modeled by three VHDL state machine processes: ‘reader’, ‘writer’ and ‘EOL_marker’.

Reader: When streaming is activated it instructs the SDRAM controller to burst-read data starting at the Read address pointer. When the data is available it’s enqueued in the input FIFO and the Read pointer is incremented (since it’s 23-bit (addressing 8 Mb) it will naturally overflow to 0 when you run out of memory). If the input FIFO becomes full the SDRAM controller stops reading, if it becomes almost empty it starts reading again thus ensuring continuous uninterrupted streaming of audio. The looper core can read data from the input FIFO transparently through the input bus.

Writer: Operates exactly in reverse: It takes data from the output FIFO (enqueued by the looper core) and stores it in the SDRAM via the SDRAM controller starting at the address pointed to by the write pointer. When the FIFO becomes empty, the controller stops storing data. If it becomes almost full then it starts storing data again.

EOL_marker: When the EOL signal is asserted, it first flushes the Output FIFO and then sets the EOL pointer to the address being currently written (i.e: the Write pointer).

It also features a debug sine-wave writer to test the MFC and audio output.

The SDRAM controller is a very cool one I pulled off It’s a port of a Xilinx memory controller for a Spartan FPGA to the Altera Cyclone II (specifically for the memory in the DE1 board!). Its greatest features are:

  • Quite parametrisable
  • Features time slot-based dual port (I use one port for reading and the other one for writing).
    • It runs at 100MHz (130MHz was originally supported, but the port doesn’t work at that freq).– Interfaces with your core mostly like a SRAM (address bus, data bus and simple control bus).

Testing the MFC and CODEC

Once I had the MFC and CODEC up and running, I uploaded to the SDRAM a .wav file and then wrote a simple core to stream it through the MFC out to the CODEC to see if it plays back.

I plugged the board to a cheap guitar amp to hear it.

Keyboard controller

Since I needed fairly big keys to be able to control the core while I was playing guitar I opted by having a PS/2 keyboard on the floor and pressing the bigger and most accessible keys (Spacebar and Ctrl’s) with my foot. The DE1 board has a PS/2 port so I chose to use the PS/2 controller that comes with the board (written in Verilog).

With the keyboard you can assert a few commands:

  • Start recording
  • End of loop (the looper starts playing back what you played, but it doesn’t record what you play now).
  • End of loop, but start recording a new ‘layer’ of sound (This allows to overlay what you play now to what is being replayed by the looper).
  • Pause/resume layer recording
  • Pause/resume regular (no-layer) recording

I used a Logitech wireless keyboard connected to the PS/2 port.

Main core

Here’s where everything is glued together. The main core uses FSMs to stream data between the MFC and the CODEC in both directions simultaneously. It’s fairy simple, since most of the complexity is carried by the MFC. It just responds to commands from the keyboard, controls the MFC and CODEC and connects the busses.

It also flashes some of the on-board LEDs as debug indicators.


Does it work?. Yes!, absolutely. Below you can hear me playing random tunes with it. I hadn’t enough cables/connectors to daisy-chain it with an effects processor, so everything is clean (straight from the guitar pickups).

Test 1: (In this test I first record the rithm and then I play a few arrangements over it)

Test 2: The eye of the tiger! (same as above)

Test 3: “The hell song” by Sum41. Here I demonstrate the ability to pause/unpause the loop playback. (Sounds way better with distortion!)

Test 4: Different tune. Here I first play and record the rithm, then I play on top of it, and at some point I pause the playback.

Test 5: Some song by Sum41. Nothing special

Test 6: Sound Layering test. Here I first play the rithm and I tap the ‘end of loop’ button. Then I layer an arrangement on top of it. And then I play a third voice on top of both layers.

Test 6: Sound Layering test 2. This time I stack up to 4 sound layers.


  • Not bad as a proof of concept.
  • Implementing it with an FPGA can be quite oversized and a bit expensive. There are a few microcontrollers featuring high pin count and internal SDRAM controller that can run the SDRAM as fast as 66MHz.
  • Good sound quality
  • Can record up to 47 seconds of high-quality audio
  • Solved our problem 🙂
  • The day I brought it to our rehearsal place I only asked for one thing: “Please somebody bring a jack to minijack adaptor so we can plug it to an amp”. Everybody forgot, so we had to hear it in “clean” (no distortion or effects) by using headphones. Shit happens 😦

As always, Thanks for reading!!

I have just made a new video of the devil’s mine using the relatively recent feature of YouTube, the 3D player.

The coolest thing about it is that the video is uploaded in side by side (only horitzontal I guess), and the player lets you choose your favorite viewing mode (anaglyph with 3 pairs of colors, interlaced (best for LG 3D Cinema TV’s), or the ubiquitious side-by-side).

Click here to view the video in the YT 3D player

The technical term for the so-called 3D is in fact stereoscopy. (i.e: when two images, one for each eye are produced, transmitted and rendered instead of one). And contrary to the popular belief is pretty simple to implement and it’s not rocket science. In fact the first stereoscopic movies (anaglyph) were born in the early 50s! (do you remember that guy with anaglyph glasses in Back to the Future? XD).

However it can be very tricky to get it right



Full scene antialiasing is being kind of a trending topic these days of inexpensive big flat displays and powerful GPUs.

Traditional AA algorithms used to rely on some sort of supersampling (i.e: rendering the scene to an n times bigger buffer and then mapping and averaging more than one supersampled pixel to a single final pixel).

Multisampling AA is the most widespread technique. 4x-8x MSAA can yield good results but can also be computationally expensive.

Morphological Antialiasing is a fairly recent technique which has grown in popularity in the recent years. In 2009, Alexander Reshethov (Intel) proposed an algorithm to detect shape patterns in aliased edges and then blending the pixels pertaining to an edge with their 4 neigborhood based on the sub-pixel area covered by the mathematical edge line.

Reshethov’s implementation wasn’t practical on GPU since it was tightly coupled to the CPU, but the concept had a lot of potential. His demo takes a .ppm image as an input and then outputs an antialiased .ppm as an output.

However, there’s been a lot of activity on this topic since then and a few GPU-accelerated techniques have been presented.

Jimenez’s MLAA

Among them, Jorge Jimenez and Diego Gutierrez’s team at the University of Zaragoza (Spain) have developed a symmetrical 3-pass post-processing technique named Jimenez’s MLAA.

According to the tests conducted by the authors, It can achieve visual results between MSAA 4x and 8x with an average speedup of 11.8x ! (GeForce9800GTX+). On the counterpart it suffers from classic MLAA problems such as handling of sub-pixel features but you can tweak some parameters to get really good results with virtually non noticeable glitches at a fraction of the time and memory that MSAA takes!

The algorithm, in a nutshell, works as follows:

In the first step a luma-based discontinuity test is performed on the RTT’ed scene for the current pixel and its 4-neighborhood. The result is encoded in an RGBA edges texture.

One can easily notice that it produces artifacts in zones that are not necessarily edges. The threshold can be tweaked, but converting RGB to the luma space has its issues when two completely different colors map to similar luma values.

The second step takes the edges texture and with the help of a precomputed area texture determines for each edgel (pixel belonging to an edge) the area above and below the mathematical edge crossing the pixel. This areas are encoded into another RGBA texture and used as blending weights. Here a specially smart use of the hardware bilinear filtering is made by sampling inbetween two texels to fetch two values in one single access.

In the last step the original aliased image and the blending weights texture are used to do the actual blending and generate the final image.

Here’s the original aliased image (taken from NoLimits)

All of the screenshots here are lossless PNGs, so go ahead and zoom in 😀

Translation into GLSL

You can download the source code for the original demo here.

There’s a DX9 and a DX10 version. The shaders were obviously written in HLSL. Everything contained into a single .fx file.

So in order to make it work in OpenSceneGraph I had to first translate it into 3 GLSL fragment shaders and 1 vertex shader. It needs GLSL 1.3 at least to work.

Integration into OpenSceneGraph

OSG doesn’t have a programmable post-fx pipeline itself. Instead, there’s a third party library named OSGPPU which allows you to set up a graph made up of PPUs (Post Processing Units). Each one of which have an associated shader program, one or more input textures (inherently the one from the previous step), and an output texture which can be plugged to the next step and so on.

The construction of the postFX pipeline for JMLAA was painless, however there is a detail that I haven’t still been able to figure out: correct stencil buffer usage.

An optimization which may yield a big performance boost is the usage of the stencil buffer as a processing mask. When creating the edges texture in the first step you also write an 1 to the previously fully zeroed stencil buffer in its corresponding location. The pixels that don’t satisfy the condition of being part of an edge are (discard;)ed. In the subsequent steps the values of the stencil are used as a mask, so pixels not belonging to edges are quickly discarded in the graphics pipeline.

But for some reason, OSGPPU either doesn’t clear the stencil properly or updates it prematurely, so I couldn’t get this working and had to process every pixel in all three steps without discarding everything. But even though so, I noticed no performance hit when loading fairly complex models. Here’s the thread where I asked for help.


I wrote a little demo app which disables the default OSG’s MSAA, loads up a 3D model (it supports a few different formats) and displays it on a viewer. You can view the intermediate (edges and weights) textures, as well as the original and antialiased final images. By default it uses a depth-based discontinuity test instead of the luma one.

This is the original aliased image (zoomed in by 16x):

And this one is the filtered final image produced by JMLAA:

You will find more details on JMLAA in the book GPU Pro 2 !


You can download here  a VS 2008 project along with the source, a default model, the shaders and the precompiled binaries for OSG/OSGPPU. It should compile and run out-of-the-box.

Yet another Sonic clone

Posted: July 15, 2011 in Games


I’ve been a Sonic series enthusiast since I got my Sega Genesis as a child. To me, it’s the perfect match between platforms and speed, two genres I love. Even though time flies, it never gets old.

First off, I must say that I’m not associated with Sega or Sonic team in any way, and what I’m gonna show you is just a simple fan game I made for fun.

This was one of my first amateur side scrollers that I made like 5 or 6 years ago or so, I had made a couple of very simple scrollers in Flash in the past, but I wanted to work with old school tiles.

MFC stands for the Microsoft Foundation Classes which I used as a rendering API. Since I wanted to use it as a project for a college subject the use of MFC, though not the most efficient, was mandatory.

It features a couple of badniks, rings, goal billboards, moving platforms, springs, a final boss and a level editor.

I found the tiles, backgrounds and sprites on some website so I only needed to focus on programming.

Got ring?

As you may know, in the old times the home consoles and arcade machines were mostly tile-based engines. In a nutshell, everything you saw on the screen was made up of fixed-size *usually* square tiles which were laid out in a particular fashion according to some table in memory.

The video signal generator just checked that table and the tiles in video memory to generate the video output.

Of course there’s a lot of nuts and bolts to it (scrolling playfields, mirroring, transparency, overlapping…) but that’s another story.

This app allows you to set up a scrolling playfield made of tiles and specify its absolute position on screen. It will determine the tiles visible in the viewport, their offset and how they should be displayed. Then the backend MFC renderer does the rendering job.

As for the animated sprites, there’s a base AnimatedSprite class which implements an interface for rendering and for specifying the status of the animation (playing, stopped,…) as well as its speed and extents.

The physics are simple enough to make the game look almost like the original (of course, it’s substantially less feature-complete). Collisions with the scenery are handled in a per-tile basis where each tile has its own collision properties.

The game logic and automatas glue everything else up.

Regarding sound, the Windows MM API is used and there’s a folder with .wav sound cues and BGM on it. But I have recently discovered that it freezes on Windows 7 when it tries to play more than one sound simultaneously. But there’s a workaround.

The final boss is a funny inter-company match where Super Mario throws items from his game 😀

The whole game was created from scratch in about three weeks working about 4 hours per day in the evenings.

Unfortunately and due to a hard disk failure, I lost the source code, but I still have the binaries.

And I keep it as a bit of history.

Dead And Angry

Posted: July 14, 2011 in Computer Graphics, Games

That is the spooky title we gave to a demo game we’ve made at the URJC.

This game has been made by a team of 4 people (which includes me) with no prior knowledge of Unreal Development Kit.

It features custom models, AI, sceneries, UI, cinematics, gameplay and a mutiplayer mode.

In my opinion, UDK can be an extremely sweet tool for artists and designers but it can also be a pain to programmers (let’s recall that I’m talking about the free UDK version) due to the lack of consistent and well-organized documentation and examples. But in the other hand, the community is very active and responsive.

All I have to say is that we’ve learned a lot about the whole process of making a game, especially about how important the preproduction is. And how many features you have to prematurely trash due to deadlines.


This was a project for my Masters in Computer Graphics, Games and Virtual Reality at URJC.

We were asked to develop some sort of mine train real-time animation from scratch. It had to feature dinamically-generated tunnel bumps with Perlin noise, on-board camera view and three rendering modes: Polygon fill, Wireframe and Points. We chose OpenSceneGraph for the job.

Design tools

As a big fan of rollercoasters I had spent hours on the NoLimits Rollercoaster Simulator which has a quite mature coaster editor. There’s plenty of coasters made with NoLimits around the net, most of them are reconstructions of real ones.

I thought it could be a good idea to be able to load coaster models in NoLimits (.nltrack) format as it would allow us to design the track and the scene in a visual way using the NL Editor.

The .nltrack format is binary and not documented. It contains the shape of the track as control points of cubic Bezier curves. It also contains info about the colors, supports, external .3DS objects and info about the general appearance of the rollercoaster.

Using Hexplorer and the NL editor itself I was able to figure out the control points and the location/scaling/rotation of the external 3D models. Later I discovered that there’s a library called libnltrack, which helped a lot.

My pal Danny modeled a couple of rooms, an outdoor scene and a lot of mine props (barrels, shovels, …). Then he imported them into the editor and laid out a coaster track passing trough all of the scene.

Coaster geometry

Correct generation of the rails and the crossbeams for the track was a bit of a challenge, and it needed to be efficient!.

I came up with a solution based on the concept of a “slider”, a virtual train which can be placed at any place around the track (just specifying how many kilometers away from the origin (the station) it would be), and it returns three orthonormal vectors forming a base which was then used to transform vertices to the train’s POV.

By using two sliders, one ahead of the other one can set vertices back and forth to form triangle strips in order to generate perfectly stitched cylinders. I ran into a couple of problems when the track was almost vertical but I finally managed to solve them.

Upon startup, the geometry for the whole coaster is generated. The engine generates about 15 meters of track per geode, this way OpenSceneGraph is able to cull the out-of-sight track segments efficiently. Besides, two levels of detail are generated based on the distance to the camera.

As for the crossbeams, it’s just a .3ds model which is repeatedly placed along the track.


The program generates a 256×256 grayscale perlin noise texture which is then used as a displacement mapping for a cylinder mesh generated around the track on load time.

The editor is able to mark segments as ‘tunnel’ easily turning tunnels on or of in a per-segment basis.

The meshes are also segmented for better culling and stitched together. They have a diffuse rock and floor texture applied.


The train is a .3DS model by Danny which has a slider assigned to it and its animated following an extremely simple phyisics scheme based on the potential energy of the train. It has a spotlight on the front so the track, rooms and tunnels are illuminated as the train goes trhough. Moreover the illumination of the train mesh is switched from the sunlight to the spotlight based on wether it’s in a tunnel or not.


A skydome, lens flare (props to Tomás), and OSG’s impementation of shadow mapping were added in.

Audio and others

In the last minute before the deadline, supports for the track were generated as regularly-placed cylinders, but unfortunately that wasn’t there yet at the time the screenshots and the videos were taken.

A white noise audio file is played with a pitch and volume proportional to the train speed.

To be done

Due to the tight timing constraints we were subject to I was forced to leave a lot of things to be done, among them:

– Per-pixel lighting.

– Post-processing effects (vignette and HDR)