Today is a great day. The version of Spelunky for the PlayStation®3 and the PlayStation®Vita we’ve been working on at BlitWorks has been released both in America and Europe.

Spelunky is a 2D platformer with randomized levels where endless combinations of crazy stuff can happen everytime you play, thus redefining the word ‘addictive’.

The game came out last year for the XBox360 achieving great success. Earlier this month a slightly more featured version for Steam and GOG was released, but today it makes its debut in Sony’s desktop and (for the first time) handheld consoles.

What sets this port apart from the others is:

– Cross-buy & Cross-play: Buy the PS3 version and get the Vita one for free (and vice-versa). Play in one console and your progress will be automatically sync’d in the cloud letting you continue in another console where you left.

– Wireless co-op mode: Up to 3 Vitas can be hooked up via Wi-Fi with a PS3 to play in co-op (either adventure or deathmatch). Or use your Vita to play against other vitas on the go via Ad-Hoc.

– Every Vita owns its own camera, so there’s no need for everyone in the game to be constrained to the same frame like in other versions.

– Touch features and accelerometer 3D effects on menus! (Vita only)

– Controller vibration feature in PS3

There’s a free demo awaiting for you to try at PlayStation Store.

The full version is priced at $14.99 in America and 14.99 € in Europe. If you’re a PS Plus subscriber you’ll get a 20% discount.

The port is getting fairly good reviews so far 🙂




Time to move on!.  Earlier this year I switched to a new job at BlitWorks. BlitWorks is a games port specialist. We work with systems as classic as the SEGA Dreamcast, SEGA Genesis/Megadrive or as new as the PlayStation®4 system.

I work as a part of a highly talented and experienced team in projects as high-profile as the insanely wild indie platformer Spelunky or as well-known and loved as SEGA’s Jet Set Radio or Sonic CD !.

Challenging, exciting, hard yet rewarding work. I’m frankly delighted to be working with such an array of classics of the videogames history and being a part of such an amazing team!

BlitWorks is a spin off of Blit Software.

Now enjoy a few videos of these all-time classics in their BlitWorked versions 🙂 :

(Note: The first video is the promo trailer of Spelunky, not actual in-game footage of the version we worked on)

A few years ago I was playing rythm guitar in a small band along with three of my friends. One day, our lead guitarist spotted how useful a looper pedal could be to us, but he was unsure if the purchase was worth the price.

For those of you that are not familiar with this kind of gear, a looper pedal is a device that is daisy-chained between an electric guitar and the amp, recorder, or the rest of your gear. It’s able to record what you play and at some point play it back in the background while you play new stuff. It’s extremely useful where more guitars than available are needed. Usually it can be controlled by means of some footswitch, hence the ‘pedal’ denomination.

I remember that I was shocked of seeing looper pedals starting at $130. Soon, my engineering-obsessive mind started to think of ways of building a cheaper device without compromising quality.


I made a wishlist for a decent looper pedal features:

  • It should be able to start recording when I tap a switch.
  • It should be able to stop recording and play the loop back when I want.
  • It should allow me to add up more “layers” of sound on top of the existing loop.
  • I should be able to stop it whenever I want.
  • It must feature a decent sound quality.
  • I must stay under $50

Given this list of requirements, I decided to work on a desing, and build a prototype.


Okay, two things were clear… I was going to need quite a big deal of RAM to store the audio, and a decent CODEC to DAC/ADC it.

Real-time audio streaming can become a relatively bandwidth-demanding task, so the RAM needed to be both fast and big enough.

In order to control de pedal I was going to need at least two buttons, but that was the smaller of my concerns.

Tools of trade

For this prototype I was lucky to have an Altera DE1 training board I bought to play with FPGAs.


The board features an Altera Cyclone 2 FPGA, an 8-MByte SDRAM Chip and a WM8731 high-quality, 24-bit, sigma-delta audio encoder/decoder with minijack inputs/outputs.

The RAM is 16-bit wide, and the CODEC is 24-bit, so to achieve CD-Quality (44,100 Hz) the formula would be like:

44,100 Hz x 3 bytes = 132300 bytes per second, or 129 KBps

Then, is the RAM capable of handling that?

With SDRAMs you can R/W in a mode named Burst where you give the starting address and the number of words to read/write and then the words are read/written sequentially with no more overhead. Let’s do some calculations:

What is the minimum frequency needed to stream 24-bit audio@44,1KHz to/from the SDRAM?

A naive calculation would be:

  • 1 word = 16 bits = 2 bytes
  • We need to stream in 24-bit units, that would be 1.5 words, so let’s round it up to 2 words
  • That makes 176.4 Kbps or 88.2 Kwords per second
  • Let’s double it because the looper might be recording new data at the same time it’s playing the old one.
  • Ideally, a theoretical bandwidth of 176.4KHz would suffice, but in practice there is an overhead due to CAS/RAS strobe, the memory switching a page, etc…

This particular chip can run up to 133 Mhz, which yields far more than needed bandwidth. One problem solved 🙂

At this point it looked like everything boiled down to hooking up the SDRAM, the CODEC and some sort of footswitch.


I came up with the following design:

The board has 2 XTALs, one running at 50MHz and another one running at 27 MHz.

– The SDRAM needs 100MHz, so one of the built-in PLLs in the FPGA was used to double the frequency of the 50MHz XTAL up to 100MHz.

– The audio CODEC streams data at 18.4MHz, so another PLL was used to reduce the 27MHz clock frequency down to 18.4.

  • The rest of the cores run at 50MHz

Let’s examine the rest of the modules I wrote in VHDL:

Reset delay

This module makes sure that every other module has been properly reset and that the PLLs are locked before the whole system starts operating.

Its inputs are just a signal from one of the on-board buttons that acts as a RESET button, the 50MHz clock and the ‘locked’ signals from the PLLs.

When the external reset is just deasserted, or the system is just powered on AND the PLLs are locked, it uses a register to count 3 million cycles of the 50MHz clock and then deasserts the internal reset signal used by the rest of the modules.

CODEC controller

It’s responsible for streaming data to and from the CODEC in I2S format. Its interface with the other modules are a couple of 16-bit busses for input/output streaming data, reset and a 18.4 Mhz clock from the Audio PLL.

I2C controller

The CODEC has a lot of configuration parameters (like gain, volume, input selection, audio format…) that you must configure using I2C@20KHz. This controller operates pretty much on its own and its only task is to send a sequence of predefined commands to configure the CODEC just after RESET. It’s written in Verilog and I grabbed it from the Altera demo cores that came with the board.

MFC (Memory Flow Controller)

This is one of the most complex modules of my looper.

It interfaces with the SDRAM through two FIFOs and is responsible of feeding the looper core with streaming data stored in the SDRAM and to write to the SDRAM data streamed from the looper core. Its interface with the other modules are a couple of 16-bit busses for I/O and a control bus for controlling and signaling these conditions:

  • Input fifo full (tristate)
  • Output fifo empty (tristate)
  • EOL (End Of Loop) (Core → MFC)
  • Busy (MFC → Core)
  • Start writer (MFC → Core)
  • Write enable (MFC → Core)
  • Read enable (MFC → Core)
  • Fifo clock (data in the busses is sync’d to this clock)
  • Reset and clock as usual

The core uses 3 23-bit registers to point to relevant memory addresses inside of the SDRAM:

  • Read address
  • Write address
  • End of loop pointer

The behavior is modeled by three VHDL state machine processes: ‘reader’, ‘writer’ and ‘EOL_marker’.

Reader: When streaming is activated it instructs the SDRAM controller to burst-read data starting at the Read address pointer. When the data is available it’s enqueued in the input FIFO and the Read pointer is incremented (since it’s 23-bit (addressing 8 Mb) it will naturally overflow to 0 when you run out of memory). If the input FIFO becomes full the SDRAM controller stops reading, if it becomes almost empty it starts reading again thus ensuring continuous uninterrupted streaming of audio. The looper core can read data from the input FIFO transparently through the input bus.

Writer: Operates exactly in reverse: It takes data from the output FIFO (enqueued by the looper core) and stores it in the SDRAM via the SDRAM controller starting at the address pointed to by the write pointer. When the FIFO becomes empty, the controller stops storing data. If it becomes almost full then it starts storing data again.

EOL_marker: When the EOL signal is asserted, it first flushes the Output FIFO and then sets the EOL pointer to the address being currently written (i.e: the Write pointer).

It also features a debug sine-wave writer to test the MFC and audio output.

The SDRAM controller is a very cool one I pulled off It’s a port of a Xilinx memory controller for a Spartan FPGA to the Altera Cyclone II (specifically for the memory in the DE1 board!). Its greatest features are:

  • Quite parametrisable
  • Features time slot-based dual port (I use one port for reading and the other one for writing).
    • It runs at 100MHz (130MHz was originally supported, but the port doesn’t work at that freq).– Interfaces with your core mostly like a SRAM (address bus, data bus and simple control bus).

Testing the MFC and CODEC

Once I had the MFC and CODEC up and running, I uploaded to the SDRAM a .wav file and then wrote a simple core to stream it through the MFC out to the CODEC to see if it plays back.

I plugged the board to a cheap guitar amp to hear it.

Keyboard controller

Since I needed fairly big keys to be able to control the core while I was playing guitar I opted by having a PS/2 keyboard on the floor and pressing the bigger and most accessible keys (Spacebar and Ctrl’s) with my foot. The DE1 board has a PS/2 port so I chose to use the PS/2 controller that comes with the board (written in Verilog).

With the keyboard you can assert a few commands:

  • Start recording
  • End of loop (the looper starts playing back what you played, but it doesn’t record what you play now).
  • End of loop, but start recording a new ‘layer’ of sound (This allows to overlay what you play now to what is being replayed by the looper).
  • Pause/resume layer recording
  • Pause/resume regular (no-layer) recording

I used a Logitech wireless keyboard connected to the PS/2 port.

Main core

Here’s where everything is glued together. The main core uses FSMs to stream data between the MFC and the CODEC in both directions simultaneously. It’s fairy simple, since most of the complexity is carried by the MFC. It just responds to commands from the keyboard, controls the MFC and CODEC and connects the busses.

It also flashes some of the on-board LEDs as debug indicators.


Does it work?. Yes!, absolutely. Below you can hear me playing random tunes with it. I hadn’t enough cables/connectors to daisy-chain it with an effects processor, so everything is clean (straight from the guitar pickups).

Test 1: (In this test I first record the rithm and then I play a few arrangements over it)

Test 2: The eye of the tiger! (same as above)

Test 3: “The hell song” by Sum41. Here I demonstrate the ability to pause/unpause the loop playback. (Sounds way better with distortion!)

Test 4: Different tune. Here I first play and record the rithm, then I play on top of it, and at some point I pause the playback.

Test 5: Some song by Sum41. Nothing special

Test 6: Sound Layering test. Here I first play the rithm and I tap the ‘end of loop’ button. Then I layer an arrangement on top of it. And then I play a third voice on top of both layers.

Test 6: Sound Layering test 2. This time I stack up to 4 sound layers.


  • Not bad as a proof of concept.
  • Implementing it with an FPGA can be quite oversized and a bit expensive. There are a few microcontrollers featuring high pin count and internal SDRAM controller that can run the SDRAM as fast as 66MHz.
  • Good sound quality
  • Can record up to 47 seconds of high-quality audio
  • Solved our problem 🙂
  • The day I brought it to our rehearsal place I only asked for one thing: “Please somebody bring a jack to minijack adaptor so we can plug it to an amp”. Everybody forgot, so we had to hear it in “clean” (no distortion or effects) by using headphones. Shit happens 😦

As always, Thanks for reading!!

I have just made a new video of the devil’s mine using the relatively recent feature of YouTube, the 3D player.

The coolest thing about it is that the video is uploaded in side by side (only horitzontal I guess), and the player lets you choose your favorite viewing mode (anaglyph with 3 pairs of colors, interlaced (best for LG 3D Cinema TV’s), or the ubiquitious side-by-side).

Click here to view the video in the YT 3D player

The technical term for the so-called 3D is in fact stereoscopy. (i.e: when two images, one for each eye are produced, transmitted and rendered instead of one). And contrary to the popular belief is pretty simple to implement and it’s not rocket science. In fact the first stereoscopic movies (anaglyph) were born in the early 50s! (do you remember that guy with anaglyph glasses in Back to the Future? XD).

However it can be very tricky to get it right


New job!

Posted: August 23, 2011 in News

I’ve been recently recruited by Gameloft’s Madrid studio.

Gameloft is one of the major game producers for Mobile Phones and handheld devices such as the Nintendo DS. With 20 development studios around the world, they have sold more than 20 million games only via Apple’s App Store.

One thing I specially like about Gameloft is their licenses (Assassin’s Creed, Spider Man, NFL, Sonic Unleashed …).

 The Madrid studio is located at a privileged place in the center of the city and is 1 year old. Looks like they’ve been hiring people for the recently open studio ever since, and they keep growing!

The Barcelona studio is older and has produced games based on remarkably important licenses like the handheld version of the game based on James Cameron’s Avatar.

Needless to be said, I’m pretty excited about this and looking forward to start making games!

There was one thing that prevented me from accepting any work offers. I have to make my Master’s final project, but I have a plan :-D, so stay tuned!



Full scene antialiasing is being kind of a trending topic these days of inexpensive big flat displays and powerful GPUs.

Traditional AA algorithms used to rely on some sort of supersampling (i.e: rendering the scene to an n times bigger buffer and then mapping and averaging more than one supersampled pixel to a single final pixel).

Multisampling AA is the most widespread technique. 4x-8x MSAA can yield good results but can also be computationally expensive.

Morphological Antialiasing is a fairly recent technique which has grown in popularity in the recent years. In 2009, Alexander Reshethov (Intel) proposed an algorithm to detect shape patterns in aliased edges and then blending the pixels pertaining to an edge with their 4 neigborhood based on the sub-pixel area covered by the mathematical edge line.

Reshethov’s implementation wasn’t practical on GPU since it was tightly coupled to the CPU, but the concept had a lot of potential. His demo takes a .ppm image as an input and then outputs an antialiased .ppm as an output.

However, there’s been a lot of activity on this topic since then and a few GPU-accelerated techniques have been presented.

Jimenez’s MLAA

Among them, Jorge Jimenez and Diego Gutierrez’s team at the University of Zaragoza (Spain) have developed a symmetrical 3-pass post-processing technique named Jimenez’s MLAA.

According to the tests conducted by the authors, It can achieve visual results between MSAA 4x and 8x with an average speedup of 11.8x ! (GeForce9800GTX+). On the counterpart it suffers from classic MLAA problems such as handling of sub-pixel features but you can tweak some parameters to get really good results with virtually non noticeable glitches at a fraction of the time and memory that MSAA takes!

The algorithm, in a nutshell, works as follows:

In the first step a luma-based discontinuity test is performed on the RTT’ed scene for the current pixel and its 4-neighborhood. The result is encoded in an RGBA edges texture.

One can easily notice that it produces artifacts in zones that are not necessarily edges. The threshold can be tweaked, but converting RGB to the luma space has its issues when two completely different colors map to similar luma values.

The second step takes the edges texture and with the help of a precomputed area texture determines for each edgel (pixel belonging to an edge) the area above and below the mathematical edge crossing the pixel. This areas are encoded into another RGBA texture and used as blending weights. Here a specially smart use of the hardware bilinear filtering is made by sampling inbetween two texels to fetch two values in one single access.

In the last step the original aliased image and the blending weights texture are used to do the actual blending and generate the final image.

Here’s the original aliased image (taken from NoLimits)

All of the screenshots here are lossless PNGs, so go ahead and zoom in 😀

Translation into GLSL

You can download the source code for the original demo here.

There’s a DX9 and a DX10 version. The shaders were obviously written in HLSL. Everything contained into a single .fx file.

So in order to make it work in OpenSceneGraph I had to first translate it into 3 GLSL fragment shaders and 1 vertex shader. It needs GLSL 1.3 at least to work.

Integration into OpenSceneGraph

OSG doesn’t have a programmable post-fx pipeline itself. Instead, there’s a third party library named OSGPPU which allows you to set up a graph made up of PPUs (Post Processing Units). Each one of which have an associated shader program, one or more input textures (inherently the one from the previous step), and an output texture which can be plugged to the next step and so on.

The construction of the postFX pipeline for JMLAA was painless, however there is a detail that I haven’t still been able to figure out: correct stencil buffer usage.

An optimization which may yield a big performance boost is the usage of the stencil buffer as a processing mask. When creating the edges texture in the first step you also write an 1 to the previously fully zeroed stencil buffer in its corresponding location. The pixels that don’t satisfy the condition of being part of an edge are (discard;)ed. In the subsequent steps the values of the stencil are used as a mask, so pixels not belonging to edges are quickly discarded in the graphics pipeline.

But for some reason, OSGPPU either doesn’t clear the stencil properly or updates it prematurely, so I couldn’t get this working and had to process every pixel in all three steps without discarding everything. But even though so, I noticed no performance hit when loading fairly complex models. Here’s the thread where I asked for help.


I wrote a little demo app which disables the default OSG’s MSAA, loads up a 3D model (it supports a few different formats) and displays it on a viewer. You can view the intermediate (edges and weights) textures, as well as the original and antialiased final images. By default it uses a depth-based discontinuity test instead of the luma one.

This is the original aliased image (zoomed in by 16x):

And this one is the filtered final image produced by JMLAA:

You will find more details on JMLAA in the book GPU Pro 2 !


You can download here  a VS 2008 project along with the source, a default model, the shaders and the precompiled binaries for OSG/OSGPPU. It should compile and run out-of-the-box.

Yet another Sonic clone

Posted: July 15, 2011 in Games


I’ve been a Sonic series enthusiast since I got my Sega Genesis as a child. To me, it’s the perfect match between platforms and speed, two genres I love. Even though time flies, it never gets old.

First off, I must say that I’m not associated with Sega or Sonic team in any way, and what I’m gonna show you is just a simple fan game I made for fun.

This was one of my first amateur side scrollers that I made like 5 or 6 years ago or so, I had made a couple of very simple scrollers in Flash in the past, but I wanted to work with old school tiles.

MFC stands for the Microsoft Foundation Classes which I used as a rendering API. Since I wanted to use it as a project for a college subject the use of MFC, though not the most efficient, was mandatory.

It features a couple of badniks, rings, goal billboards, moving platforms, springs, a final boss and a level editor.

I found the tiles, backgrounds and sprites on some website so I only needed to focus on programming.

Got ring?

As you may know, in the old times the home consoles and arcade machines were mostly tile-based engines. In a nutshell, everything you saw on the screen was made up of fixed-size *usually* square tiles which were laid out in a particular fashion according to some table in memory.

The video signal generator just checked that table and the tiles in video memory to generate the video output.

Of course there’s a lot of nuts and bolts to it (scrolling playfields, mirroring, transparency, overlapping…) but that’s another story.

This app allows you to set up a scrolling playfield made of tiles and specify its absolute position on screen. It will determine the tiles visible in the viewport, their offset and how they should be displayed. Then the backend MFC renderer does the rendering job.

As for the animated sprites, there’s a base AnimatedSprite class which implements an interface for rendering and for specifying the status of the animation (playing, stopped,…) as well as its speed and extents.

The physics are simple enough to make the game look almost like the original (of course, it’s substantially less feature-complete). Collisions with the scenery are handled in a per-tile basis where each tile has its own collision properties.

The game logic and automatas glue everything else up.

Regarding sound, the Windows MM API is used and there’s a folder with .wav sound cues and BGM on it. But I have recently discovered that it freezes on Windows 7 when it tries to play more than one sound simultaneously. But there’s a workaround.

The final boss is a funny inter-company match where Super Mario throws items from his game 😀

The whole game was created from scratch in about three weeks working about 4 hours per day in the evenings.

Unfortunately and due to a hard disk failure, I lost the source code, but I still have the binaries.

And I keep it as a bit of history.