Putting things in perspective

February 18, 2012

People tend to perceive things in a logarithmic scale. This includes our perception of light brightness and sound frequency (was it also volume?). Maybe we also perceive time this way. That would make sense because recent events usually affect you more than older ones. Take a look at the following picture:

 Picture 1: logarithmic scale

The events spread out so nicely that you can actually relate between too recent and too old. Comparing with the linear timeline below I admit the first picture better reflects my view of these events:

Picture 2: linear scale

Since openoffice does not seem to do timelines, I hacked together a little perl script to make the images. You can get it here.


How to study objective C under windows

October 6, 2010
Screenshot of Cygwin

Image via Wikipedia

Objective C seems to gain popularity nowadays. My only exposure to this language is through random snippets I saw here and there, mostly written by friends of mine, chasing the dream of becoming the next iPhone millionaire. And I have to say: I find it ugly. I mean really really ugly!

People say though, that once you get used to it, it seems natural and it has advantages (read: late binding) that make it a powerful tool for creating applications.

Well, after all its just another programming language. So I thought I’d give it a try.  Preferably without owning a mac.

Here is how I did it:

  • Download cygwin – install it with the default settings.
  • That’s about it! Cygwin comes with an objective C compiler, so you are good to go once you install it.

How to write a first objective c test program:

#import <stdio.h>
int main(int argc, char *argv[]) {
    printf("Hello, from objC!\n"); 
}

Create a file named “hello.m” in your cygwin home directory containing the above code. Then open a cygwin prompt and write the command: gcc -o hello hello.m

And there you have it, a free development environment to try out the language. Your next step (pun intended) might be a good tutorial or a book describing the language. I suggest something that covers the language only, not the common APIs (GNUstep, Cocoa, whatever). If you think you can stand the language itself, you can then try to learn the APIs and create that iPhone app you were dreaming about!


How to save global variables using the lua C API

June 26, 2010
Lua (programming language)

Image via Wikipedia

This function will save all the global variables in a lua state. Only string, number and boolean variables are supported, but this code can be easily extended to support tables as well.

Enjoy :)

void LuaState::DumpGlobals()
{
	// push the first key (nil = beginning of table)
	lua_pushnil(L);

	// lua_next will:
	// 1 - pop the key
	// 2 - push the next key
	// 3 - push the value at that key
	// ... so the key will be at index -2 and the value at index -1
	while (lua_next(L, LUA_GLOBALSINDEX) != 0) {
		// get type of key and value
		int key_type = lua_type(L, -2);
		int value_type = lua_type(L, -1);

		// support only string keys
		// globals aren't likely to have a non-string key, but just to be certain ...
		if (key_type != LUA_TSTRING) {
			lua_pop(L, 1); // pop the value so that the top contains the key for the next iteration
			continue;
		}

		// support only number, boolean and string values
		if (value_type != LUA_TNUMBER &&
			value_type != LUA_TBOOLEAN &&
			value_type != LUA_TSTRING) {
			lua_pop(L, 1); // again, pop the value before going to the next loop iteration
			continue;
		}

		// get the key as a string
		string key_string = lua_tostring(L, -2); // no copy required - we already know this is a string

		// do not support variables that start with '_'
		// lua has some predefined values like _VERSION. They all start with underscore

		if (!key_string.size()) { // this again is highly unlikely, but still ...
			lua_pop(L, 1);
			continue;
		}
		if (key_string[0] == '_') {
			lua_pop(L, 1);
			continue;
		}

		string value_string;

		// convert the value to a string. This depends on its type
		switch (value_type) {
		case LUA_TSTRING:
		case LUA_TNUMBER:
			// numbers can be converted to strings

			// get the value as a string (this requires a copy because traversing tables
			// uses the top of the stack as an index. If conversion from a number to string
			// happens, the top of the stack will be altered and the table index will become invalid)
			lua_pushvalue(L, -1);
			value_string = lua_tostring(L, -1);
			lua_pop(L, 1);
			break;
		case LUA_TBOOLEAN:
			value_string = lua_toboolean(L, -1) == 0 ? "false" : "true";
			break;
		}

		// enclose the value in "" if it is a string
		if (value_type == LUA_TSTRING) {
			value_string = "\"" + value_string + "\"";
		}

		// resulting line. Somehow save this and when you need to restore it, just
		// call luaL_dostring with that line.
		SaveLine(key_string + " = " + value_string);		// Pop the value so the index remains on top of the stack for the next iteration
		lua_pop(L, 1);
	}
}

Gamedev tip #2 : Divide and conquer

April 29, 2010
Rubik's Cube

Image via Wikipedia

Even today, I am overwhelmed by the difficulty in creating a video game. Since the first years I learned how to program it seemed like a relatively easy task. Ok. I could draw a teapot on the screen. I could play sounds and music. It was ridiculously easy to see what keys was the user pressing, to read a file from disk and what not.

It’s not about the lines of code

So why is making a game more difficult than the sum of all these? This is something I found out the hard way! The relationship between the size of a project and its difficulty is not linear. Games today are usually large projects with a magnitude of tens to hundreds thousands lines of code. Thus, creating an average game today presents a challenge that requires taking a very different approach than someone is used to when dealing with small scale projects.

When a project grows, its difficulty grows not with the lines of code, but with the number of interactions that exist between the various parts of the code. These interactions must be taken into account all the time. Let us define a fully unstructured project as one in which every part of the code is able to interact with every other part. In that case these interactions are not proportional to the project’s size, but to its square.

Someone with background on algorithms can easily see where this is going… When a project is not structured, it becomes increasingly difficult to manage, until it eventually reaches a point where any addition or change becomes prohibitively troublesome.

Fortunately there exists a solution, but it requires a somehow different mindset than the one we are used to when working with small projects.

High level design

Apologizing in advance to all the people who have devoted their lives to software development and have read and written dozens of books on the subject, I am just going to say here what in my opinion is most important with respect to our context:

High level design is all about separating a large project into logical parts which are mostly independent from each other. Ideally this way we can treat every part (or module) as a small program. The keyword here is “independent”, and represents the change in mindset I was talking about earlier.

When working on a small project, we are most concerned about writing the least amount of code possible  in order to achieve our goal. As a result we focus more on reusability and we try to reuse pieces of code regardless of where in the code they exist. In essence there is no distinction between parts of code. Everything belongs to a single entity, our program, which we can manage as long as it stays small.

On the other hand, when we design something bigger, our first concern is different. We must focus on making sure that the project ever gets completed. And to achieve this we have to find a way to fight its tendency to become harder as it grows. We saw that this tendency springs from the interactions between code rather than the size of the code itself.

Preparing for a large project

So how do we deal with these interactions? Its simple: Forbid them! This is what high level design is all about. Forbiding interactions (or dependencies) between modules. The first step is to divide our code to logical parts our program will consist of. For example: graphics, sound, math, input, game logic.

We then document which modules will be allowed to depend on each other. For instance it must be obvious that graphics must not depend on sound. But there are more subtle cases. Does the sound module need to depend on math? Modern games feature 3d sound with various sound sources placed in different positions in the game world. Setting their position and velocity using a vector class might seem like an attractive idea:

sound_source->SetPosition(Vector3 pos);

How elegant is that. However is that enough to support the addition of one more dependency? The answer is no. Every dependency between modules must justify its existence beyond any doubt. We could have easily avoided the dependency like this:

sound_source->SetPosition(float x, float y, float z);

Maybe not as appealing as the former, but its not so much more troublesome either, and its enough to get rid of a dependency which is our very first priority here.

There will certainly be situations where its difficult to decide if a dependency is justified or not. I have a rule of thumb for these: “When in doubt, forbid!“, and here is why:

  • Dependencies are evil! Believe a man who has suffered!
  • If you are in doubt, you are probably already thinking about a way around the dependency. If that way was not easily implementable, you would not have been in doubt in the first place.
  • Even if you took a wrong decision, adding a new dependency is always easier than removing an already existing one.

So to summarize:

In order to give a large project a hope of ever being completed, we must divide it into small entities and forbid as many dependencies between them as possible.

I intended to add more here about my own frightening experiences, in order to show how troublesome can underestimating dependencies be, but this post already exceeded the size of a “tip”. So I will be postponing the terror for my next tip about unit testing which is somewhat related anyway.

Set your modules free!


Gamedev Tip #1 : Fix the lix

March 5, 2010
American Leak Detection

Image via Wikipedia

This is the first of a series of short articles I plan to post here. The posts will come in the form of tips (or reminders) intended for game developers. These tips will hopefully help the reader avoid common frustrating situations when developing a game.

Let’s start with one that haunts me for years!

Graphics context state leaking will make a project very hard to maintain. So find a way to eliminate the leaks at the beginning, when you initially design your engine.

At first this might seem like overengineering, since your first few lines of rendering code will go smoothly without leak management. But during the project’s lifetime there will be lots of things to add and remove here and there. Every time you try to add a new feature in graphics, you are going to break something else. And then the nightmare starts … or the detective story if you will :) Try to solve the mystery of a lost state in a couple of thousands lines of code

Possible solutions:

  • Create a function SetDefaultGraphicsState() and call it whenever you enter a new function that draws something and/or
  • Create a couple of Save/LoadGraphicsState() functions

These should be enough to guarantee a consistent state in your rendering context. You could also design these functions to be smart and check if it’s needed or not to do an API call. For example if they need to disable blending, they could first check if blending was already disabled and if it was, avoid the function call.

But I suspect a good driver might also do this check before communicating with the graphics hardware anyway.


Using custom I/O callbacks with ffmpeg

September 9, 2009
FFmpeg

Image via Wikipedia

Video playback in Conspiracies 2 is based on ffmpeg, which is a very fast and easy to use library. I got better results than using DirectShow with less effort and not to mention my video playback code is going to compile everywhere!

Following this tutorial got me started immediately. After an afternoon of coding I had a decent video player running. This tutorial shows you how to play videos from a file in the filesystem. What I needed was a little bit different though. All the assets of the game (including videos) exist in a “virtual file system” in archives. So I need to provide my own I/O functions for reading them. ffmpeg provides this functionality but it is a little bit tricky to figure out, so I thought I’d write this guide here for future reference. Thanks to the kind people at libav-user mailing list who helped me sort this out!

Implementing the I/O routines.

There are 3 routines you need to impement:

int ReadFunc(void *opaque, uint8_t *buf, int buf_size);

This function must read “buf_size” number of bytes from your file handle (which is the “opaque” parameter) and store the data into “buf”. The return value is the number of bytes actually read from your file handle. If the function fails you must return <0.

int WriteFunc(void *opaque, uint8_t *buf, int buf_size);

This is optional and behaves like the ReadFunc.

int64_t SeekFunc(void *opaque, int64_t offset, int whence) ;

This function is like the fseek() C stdio function. “whence” can be either one of the standard C values (SEEK_SET, SEEK_CUR, SEEK_END) or one more value: AVSEEK_SIZE.

When “whence” has this value, your seek function must not seek but return the size of your file handle in bytes. This is also optional and if you don’t implement it you must return <0.

Otherwise you must return the current position of your stream  in bytes (that is, after the seeking is performed). If the seek has failed you must return <0.

Opening your custom stream.

So when you need to open a custom stream for reading you must do the following:

  • Allocate a buffer for I/O operations with your custom stream. The buffer’s size must be (n + FF_INPUT_BUFFER_PADDING_SIZE), where n is the actual useful buffer size.
  • Allocate a new ByteIOContext object and initialize it by calling init_put_byte():
    int init_put_byte ( ByteIOContext * s,
    unsigned char * buffer,
    int buffer_size,
    int write_flag,
    void * opaque,
    int(*)(void *opaque, uint8_t *buf, int buf_size) read_packet,
    int(*)(void *opaque, uint8_t *buf, int buf_size) write_packet,
    int64_t(*)(void *opaque, int64_t offset, int whence) seek
    )

    s is a pointer to your ByteIOContext object.
    buffer is a pointer to your allocated buffer
    buffer_size is “n” (the useful portion of your allocated buffer)
    write_flag must be zero if your stream does not support writing
    opaque is a pointer to your custom file stream. This is going to be passed to your custom routines.
    read_packet, write_packet and seek are pointers to your custom routines. write_packet is optional (it can be NULL)

  • Use av_open_input_stream() instead of av_open_input_file() to open your stream:
    int av_open_input_stream ( AVFormatContext ** ic_ptr,
    ByteIOContext * pb,
    const char * filename,
    AVInputFormat * fmt,
    AVFormatParameters * ap
    )

    ic_ptr, filename, fmt and ap are the same parameters you use with av_open_input_file().
    pb is your initialized ByteIOContext object.

Closing your custom stream.

  • When you are done with the stream call av_close_input_stream() instead of av_close input_file().
  • Deallocate your ByteIOContext object.
  • Deallocate your buffer.

That’s it! Everything else is the same as with av_open_input_file().

Edit: Actually there is more to it.  av_open_input_file automatically probes the input format. With av_open_input_stream you must do this yourself and pass av_open_input_stream a valid AVInputFormat pointer.  Probing the input format is easy: You just fill a buffer with some bytes from the beginning of your stream and you pass it to av_probe_input_format() which hopefully recognizes the input format and returns the pointer needed. Here is my probing code:

AVProbeData probe_data;
probe_data.filename = filename;
probe_data.buf_size = 4096; // av_open_input_file tries this many times with progressively larger buffers each time, but this must be enough
// allocate memory to read the first bytes
probe_data.buf = (unsigned char *) malloc(probe_data.buf_size);
// read first 4096 bytes into buffer
// …
// probe
AVInputFormat *ret = av_probe_input_format(&probe_data, 1);
// cleanup
free(probe_data.buf);
probe_data.buf = NULL;
return ret;

Changed vcache license to MIT

July 29, 2009

I thought that GPL v3 was too restricting for such a small piece of code, so I changed the license to MIT hoping that more people are going to use it.


Vertex Cache Optimisation Library

July 28, 2009
Embroidery Sample

Image by kotog via Flickr

While working hard to make sure Conspiracies 2 is ready in time, I could not resist the temptation to tweak and optimize rendering a little more. This time I experimented with vertex cache optimizations and came up with a small piece of code that I thought it would be nice to share with the coders out there, since it is self-contained and easy to use.

The library consists of a single header file that -at least in theory- can be used with any C++ compiler, though it was only tested with visual studio (edit: gcc works as well, of course). It implements the algorithm descibed at:

http://home.comcast.net/~tom_forsyth/papers/fast_vert_cache_opt.html

Its usage is extremely simple: just initialize a VertexCacheOptimizer object and call the member function “Optimize”, passing your index buffer and triangle count.

Download the code from:

http://code.google.com/p/vcacne/

The downloadable archive contains documentation and a small test-benchmark program.

My results for optimizing generated plane meshes are very consistent (my CPU is a Athlon 64 x2 4400+ running at 2.22 GHz):

  • 143 nanoseconds per triangle
  • resulting ACMR of ~0.63 (0.63 cache misses per triangle)

The algorithm indeed runs in linear time. I always got 143 nper triangle no matter how large or small meshes I used.

Have some fun with it. Any kind of feedback is welcome. I am particularly interested in your benhmarking results and/or comparisons with other vertex cache optimizers.

Edit (18/10/2010) : Also tested with gcc. Not only it compiles, but its a lot faster. On the same system I get 22 nsec per triangle.


Multipass Rendering #2

July 15, 2009

After thinking about it for a while, there is also another aspect about multipass lighting I always feared, and its really trivial to handle, just like fog.

Transparent objects are rendered the same way as fogged objects: the final shaded pixel is linearly interpolated with a color, just like fog, but this time the color comes from the destination pixel. If you do the math,or if you take a look at my previews post, it becomes clear that transparent objects can easily be multiopass lit. Both techniques can also be combined as well.

In the case of both transparent and fogged objects the final color is calculated like this:

final_color = lerp(dest_color, fogged_shaded_color, alpha)
<==>
final_color = dest_color * (1-alpha) + fogged_shaded_color * alpha
(1)

Let A = dest_color * (1-alpha), (1) becomes:

final_color = A + fogged_shaded_color * alpha (2)

where:

fogged_shaded_color = lerp(fog_color, P1 + … + Pn, fog_factor) (P1 = first pass shaded color, Pn = nth pass)
<==>
fogged_shaded_color = fog_color * (1 – fog_factor) + ( P1 + … + Pn) * fog_factor
(3)

Let F = fog_color * (1 – fog_factor), (3) becomes:

fogged_shaded_color = F + ( P1 + … + Pn) * fog_factor (4)

Combine (2) with (4):

final_color = A + (F + ( P1 + … + Pn) * fog_factor) * alpha
<==>
final_color = A + F * alpha + ( P1 + … + Pn) * fog_factor * alpha
<==>
final_color = A + F * alpha +  P1*fog_factor*alpha + … + Pn*fog_factor*alpha

Substitute A and F again:

final_color = dest_color * (1-alpha) + ( fog_color * (1 – fog_factor) +  P1*fog_factor) * alpha + … + Pn*fog_factor*alpha
<==>
final_color = lerp(dest_color, lerp(fog_color, P1, fog_factor), alpha) + … + Pn*fog_factor*alpha

“lerp(dest_color, lerp(fog_color, P1, fog_factor), alpha)” is a normal pass with fog and dest_color terms intact (not zeroed). So the first pass is performed normally.

Now let’s see what happens with the subsequent passes. They are of the form:

additive_pass_n = Pn * fog_factor * alpha

which is equivalent to:

additive_pass_n = lerp(ZERO, Pn, fog_factor) * alpha

Which is a little different pass, with fog_color = black as we already saw, the dst blend factor set to GL_ONE, as usually is the case with additive passes, but the src blend factor set to GL_SRC_ALPHA instead of the typical GL_ONE, because we want our shaded result multiplied by the alpha value.

So in short:

  1. Multipass fogging is as easy as setting the fog color to black after the first normal pass.
  2. Multipass alpha blending is done in essentially the same way, just do the first pass normally, and then do the subsequent passes with src blend factor set to GL_SRC_ALPHA and dst blend factor set to GL_ONE.
  3. Both techniques can be combined together.

So after all multipass lighting is not so difficult to program! Actually it is easier than trying to add all the lights affecting an object to the same shader. The latter aproach may be more optimized but it’s very troublesome to manage (or generate?) the combinational explosion of shaders having 1, 2 ,3 or more lights where each of the lights can be of any type.

PS: You can use these techniques even with what I call “semi-multipass” lighting, when 2 or more lights are rendered at the same pass. If you don’t believe me, just do the math :)


Handling fog with multipass lighting

July 15, 2009

The thought of fog problems with multipass lighting has prevented me from using multipass in my renderers for too long. Today I decided to stop and think about it for a minute, and the solution is very simple!

Fog is implemented as a linear interpolation between the shaded color of the pixel and the fog color, using a fog factor. In my case the fog factor is calculated like this:

float fog_factor = exp2(-abs(view_dist * fog density));

and here is the cg code for calculating the final color:

final_color = lerp(fog_color, final_color, fog_factor);

which is equivalent to:

final_color = fog_color * (1.0 – fog_factor) + final_color * fog_factor;

when multipass lighting is used, fog should be applied to the final color (the sum of all passes):

final_color = fog_color * (1.0 – fog_factor) + (pass1_color + … + passn_color) * fog_factor;

or:

final_color = fog_color * (1.0 – fog_factor) + pass1_color * fog_factor + … + passn_color * fog_factor;

or:

final_color = lerp(fog_color, pass1_color, fog_factor) + pass2_color * fog_factor + … + passn_color * fog_factor;

So the first pass is rendered normally, as in single pass rendering, but the next passes are drawn without adding “fog_color * (1.0 – fog_factor)” each time. We have two options to implement this:

  • Write separate shaders for the first and for the subsequent passes, or
  • Make the term “fog_color * (1.0 – fog_factor)” be equal to zero in all passes except the first.

The first option might seem a little more oprimized at first, because it eliminates one or two instructions, but in fact it isn’t. The cg manual states that it’s better to use standard library functions, like “lerp”, than coding them on your own, because they are more optimized.

On top of being more optimized, the second option is also more convenient, because you don’t need to maintain separate shaders with no fog term. You just zero the fog term :)

So how can we zero the fog term? If you are reading this I think I don’t need to tell you (in fact I shouldn’t have even argued about why the second approach is better …).

Ok, ok ….

Short answer: Do the first pass normally, and the rest of the passes with fog enabled and set the fog color to black.


Follow

Get every new post delivered to your Inbox.