Why is your library so slow? You suck!

You thought having Direct3D power a 2D graphics would automatically make your game/app/whatever run like lightning. So why is your application running like a tortoise with hemorrhoids?

In the beginning

In the olden days, when we were young and fancy free, we used to write our 2D applications code similar to this:

SpriteImage _spriteImage1;
SpriteImage _spriteImage2;
SpriteImage _spriteImage3;

Sprite hero;
Sprite bullet;
Sprite enemy;

void Main()
{
   spriteImage1 = LoadImage("Hero.png"); spriteImage2 = LoadImage("Bullet.png");
   spriteImage3 = LoadImage("Enemey.png");

   hero = new Sprite(_spriteImage1, 0, 0, 64, 64);

   bullet = new Sprite(_spriteImage2, 0, 0, 8, 8);

   enemy = new Sprite(_spriteImage3, 0, 0, 128, 128);
}

void Draw()
{
   // Do stuff.. hero.Draw(); bullet.Draw(); enemy.Draw();
}

or we could have:

void Draw()
{
   // Assume that smoothing is set to Both for all sprites.
   hero.Smoothing = Smoothing.None; hero.Draw();

   // Assume that AlphaBlending is true for all sprites.
   bullet.AlphaBlending = false;

   bullet.Draw();  

   enemy.Draw();
}

This code is pretty simple, and in reality wouldn’t make your machine slow to a crawl, but imagine thousands of sprites instead of 3. In the software based (or DirectDraw) days of 2D graphics this sort of code is par for the course. State changes don’t really mean anything (in most cases) and multiple images for sprites are pretty common (I recall doing this a lot with Allegro back in the day).

Now, if you try and do code similar to this with Gorgon (again, imagine thousands of sprites) you’re going to find things aren’t running as quickly as you had assumed… Why?

Because 3D acceleration gets a lot of speed from batching

Since Gorgon uses Direct3D to deliver 2D graphics we get a lot of tricks like alphablending and smoothing for free. However there’s a price to be paid for this. We can’t just do things without some form of batching.

What is batching?

Batching is a method of grouping together your objects for rendering. For example, you would batch together sprites that all use the same smoothing state and draw those together and then draw another set of sprites without smoothing. The reason that this is important is that 3D hardware is pretty sensitive to state changes like changing textures or changing blending modes. Some state changes are obviously more expensive than others. You can go here and at the bottom of the page there’s an appendix where the cost of the state changes are shown in cycles.

So as you can see we need to rethink how to draw things. One of the most common optimizations is to replace this code:

void Main()
{
   spriteImage1 = LoadImage("Hero.png");
   spriteImage2 = LoadImage("Bullet.png");
   spriteImage3 = LoadImage("Enemey.png");  
   
   hero = new Sprite(_spriteImage1, 0, 0, 64, 64);
   bullet = new Sprite(_spriteImage2, 0, 0, 8, 8);
   enemy = new Sprite(_spriteImage3, 0, 0, 128, 128); 
}

with this:

void Main()
{
   spriteImages = LoadImage("SpritePage1.png");  
   hero = new Sprite(_spriteImages, 0, 0, 64, 64);
   bullet = new Sprite(_spriteImages, 65, 0, 8, 8);
   enemy = new Sprite(_spriteImages, 0, 65, 128, 128);
}

Notice how we’re gathering our sprites from the same image. In the first Main() function we would be switching between images for every sprite we draw (i.e. calling SetTexture in Direct3D), this costs us. So, what we need to do is use only one image like in the second Main() function. Obviously, there will be situations where we won’t be able to avoid using multiple images. And for three sprites the performance hit would be negligible, however for thousands it will start to add up (the speed of your 3D chipset will make a difference too).

Now, that’s fine if you only have one image, but what if you MUST have more than one image? There’s a further optimization you’ll need to do in your drawing code:

void Draw()
{
   // These share the same image.
   Hero.Draw();
   Enemy.Draw();  

   // This uses a different image.
   Bullet.Draw();
}

Notice how we’ve grouped the hero and enemy sprites? This keeps us from a texture switch until we absolutely need to have one. If you were to draw the hero, the bullet and the enemy you’d have to switch from the hero/enemy texture to the bullet texture and back again and this would consume 2500 – 3100 cycles for the first switch, then another 2500 – 3100 cycles for the second for a total of 5000 – 6200 cycles. Whereas in the function provided above you only have the one 2500 – 3100 cycle hit.

One other thing

If you use the above optimization that means you’ll need to pack your sprite images into the same image. For example:

Unpacked and packed images.

Other considerations

There are a few other things you can do to speed things up:

  • Turn off alpha blending – Alpha blended sprites can be much slower. Consider using the alpha masking functionality instead to clear away the portion of the image we don’t need – Sure it may be a little less pretty, but it’ll speed things up.
  • Turn off smoothing – This is very expensive, if you don’t need it, don’t use it.
  • Don’t overdo the shaders – Don’t use a shader if you don’t need one.
  • Batch Batch Batch – As said before batch as much as you can. If you have 100 sprites that need alphablending and 100 that don’t, then draw those 100 before (or after) the 100 that don’t. This can’t be stated enough.
  • Don’t draw what you don’t need – If your sprite is outside of the bounds of your current viewport, then don’t draw it. The fastest sprite you can get is the one you don’t draw.
  • Don’t use a depth buffer if you don’t need it – This is a general thing since Gorgon doesn’t really make use of the depth buffer, however, it WILL make use of the stencil portion of the buffer and this costs extra cycles to clear. (In future versions of Gorgon the depth buffer may be used.)
  • Use as small an image as you can – This one is from personal experience, I’ve gotten better performance from smaller images. I.e. if you can pack all your sprites into a 64×64 image instead of a 128×128 image, then do so.
  • Try to make your image sizes a power of 2 – While most cards nowadays can support any size texture, this comes at a small cost. Plus, older cards could not handle images that weren’t power of two, and even some modern cards can only do non-power of two images within certain restrictions.

I’m sure there are a million more tips, you can go to GameDev.net to learn about general optimizations for 3D accelerated graphics. If you have more tips or feel that the information here needs some touching up, by all means, edit this page and add your tips or make your changes.

Leave a Reply