Performance Optimization

While Starling mimics the classic display list of Flash, what it does behind the scenes is quite different. To achieve the best possible performance, you have to understand some key concepts of its architecture. Here is a list of best practices you can follow to have your game run as fast as possible.

General AS3 Tips

Always make a Release Build

The most important rule right at the beginning: always create a release build when you test performance. Unlike conventional Flash projects, a release build makes a huge difference when you use a Stage3D framework. The speed difference is immense; depending on the platform you're working on, you can easily get a multiple of the framerate of a debug build.

  • In Flash Builder, release builds are created by clicking on “Project - Export Release Build”.
  • In Flash Develop, choose “Release” configuration and build the project, then choose “ipa-ad-hoc” or “ipa-app-store” option when you execute the “PackageApp.bat” script.
  • In IntelliJ IDEA, select “Build - Package AIR Application”; choose “release” for Android and “ad hoc distribution” for iOS. For non-AIR projects, deselect “Generate Debuggle SWF” in the module's compiler options.
  • If you build your Starling project from command line, make sure -optimize is true and -debug is false; these are also their default values.

Check your Hardware

Be sure that Starling is indeed using the GPU for rendering. That's easy to check: if Starling.current.context.driverInfo contains the string “Software”, then Stage3D is in software fallback mode, otherwise it's using the GPU.

Furthermore, some mobile devices can be run in a “Battery Saving Mode”. Be sure to turn that off when making performance tests.

Set the Framerate

Is your framerate somehow stuck at 24 frames per second, no matter how much you optimize? Then you probably never set your desired framerate, and you'll see the Flash Player's default setting.

To change that, either use the appropriate metadata on your startup class, or manually set the framerate at the Flash stage.

[SWF(frameRate="60", backgroundColor="#000000")]
public class Startup extends Sprite
{ /* ... */ }
// or somewhere in the source:
Starling.current.nativeStage.frameRate = 60;    

Use Adobe Scout

Adobe published an amazing performance profiling tool called “Scout”. It's very lightweight and easy to use: just compile your game with the “-advanced-telemetry” flag, then open up Scout before launching your game. It works even on mobile devices!

Decode Loaded Images Asynchronously

By default, if you use a Loader to load a PNG or JPEG image, the image data is decoded when it's used to create a Texture. This happens on the main thread and can cause your app to block when creating textures from loaded images. Try setting the image decoding policy flag to ON_LOAD, which will decode the image in the Loader's background thread. Starling's AssetManager already uses this ON_LOAD technique for images it loads.

loaderContext.imageDecodingPolicy = ImageDecodingPolicy.ON_LOAD;
loader.load(url, loaderContext);

Find out more about this topic in the LoaderContext documentation.

Syntax Tips

ActionScript 3 contains a few 'gotcha's that degrade performance and are best avoided. Here are the most important you should know about:


When working with loops that are repeated very often or are deeply nested, it's better to avoid “for each”; the classic “for i” yields a better performance. Furthermore, beware that the loop condition is executed once per loop, so it's faster to save it into an extra variable.

// slowish:
for each (var item:Object in array) { ... }
// better:
for (var i:int=0; i<array.length; ++i) { ... }
// fastest:
for (var i:int=0, l:int=array.length; i<l; ++i) { ... }
Avoid Object Creation

Avoid creating a lot of temporary objects. They take up memory and need to be cleaned up by the Garbage Collector, which might cause small hiccups when it's running. The “Pool” class introduced with Starling 2.0 might help with that!

// bad:
for (var i:int=0; i<10; ++i)
    var point:Point = new Point(i, 2*i);
// better:
var point:Point = new Point();
for (var i:int=0; i<10; ++i)
    point.setTo(i, 2*i);
// best:
var point:Point = Pool.getPoint();
for (var i:int=0; i<10; ++i)
    point.setTo(i, 2*i);
Pool.putPoint(point); // don't forget this!

Starling Specific Tips

Minimize State Changes

As you know, Starling uses Stage3D to render all visible objects. This means that all drawing is done by the GPU.

Now, Starling could send one quad after the other to the GPU, drawing one by one. In fact, this is how the very first Starling release worked. However, for optimal performance, GPUs prefer to get a huge pile of data and draw all of it at once.

That's why newer Starling versions batch as many quads together as possible before sending them to the GPU. However, it can only batch quads that have similar properties. Whenever a quad with a different “state” is encountered, a “state change” occurs, and the previously batched quads are drawn.

I use “Quad” and “Image” synonymously in this article. Remember, Image is just a subclass of Quad that adds a few methods. Besides, Quad extends “Mesh”, and what you read below is true for meshes, as well.

These are those crucial properties that make up a state:

  • The texture (different textures from the same atlas are fine, though)
  • The blendMode of display objects
  • The textureSmoothing value of meshes/quads/images
  • The textureRepeat mode of meshes/quads/images

If you set up your scene in a way that creates as little state changes as possible, your rendering performance will profit immensely.

The Statistics Display

Starling's statistics display, which you can activate by enabling starling.showStats, shows you the number of draw calls that are executed per frame (third line: DRW). The more state changes you have, the higher this number will be. Your target should always be to keep it as low as possible. The following tips will show you how.

Note: Starling explicitly decrements the draw count displayed to take into account the stats display being used.

The Painter's Algorithm

To know how to minimize state changes, you need to know the order in which Starling processes your objects.

Like Flash, Starling uses the Painter's algorithm to process the display list. This means that it draws your scene like a painter would do it: starting at the object at the bottom layer (e.g. the background image) and moving upwards, drawing new objects on top of previous ones.

If you'd set up such a scene in Starling, you could create three sprites: one containing the mountain range in the distance, one with the ground, and one with the vegetation. The mountain range would be at the bottom (index 0), the vegetation at the top (index 2). Each sprite would contain images that contain the actual objects.

On rendering, Starling would start at the left with “Mountain 1” and continue towards the right, until it reaches “Tree 2”. Now, if all those objects have a different state, it would have to make 6 draw calls. If you load each object's texture from a separate Bitmap, this is what will happen.

Another tool at your disposal is the DisplayObjectContainer::sortChildren() method which can be used to sort layers, within a Sprite object for example, based on properties such as x, y, alpha etc. The method accepts a compare function which means you can sort objects based on any criteria you wish! :-D

Texture Atlas

That's one of the reasons why texture atlases are so important. If you load all those textures from one single atlas, Starling will be able to draw all objects at once! (At least if the other properties listed above do not change.)

Here, each image uses the same atlas (depicted by all nodes having the same color). The consequence of this is that you should always use an atlas for your textures.

Sometimes, though, not all of your textures will fit into a single atlas. An atlas should not be bigger than 2048×2048 pixels (this is the maximum texture size on some mobile hardware), so you'll run out of space sooner or later. But this is no problem — as long as you arrange your textures in a smart way.

Both those examples use two atlases (again, one color per atlas). But while the display list on the left will force a state change for each object, the version on the right will be able to draw all objects in just two batches.

The MeshBatch class

The fastest way to draw a huge number of quads or other meshes at once is to use the MeshBatch class directly — that's the same class that is used internally by Starling for all rendering, so it's heavily optimized. (If you're still using Starling 1.x, look for QuadBatch instead.) It works like this:

var meshBatch:MeshBatch = new MeshBatch();
var image:Image = new Image(texture);
for (var i:int=0; i<100; ++i)
    image.x += 10;

Did you notice? You can add the same image as often as you want! Furthermore, adding it won't cause any event dispatching, as is the case when you add an object to a container.

As expected, this has some downsides, though:

  • All the objects you add must have the same state (i.e. use textures from the same atlas). The first image you add to the MeshBatch will decide on its state. You can't change the state later, except by resetting it completely.
  • You can only add instances of the Mesh class or its subclasses (that includes Image, Mesh, and even MeshBatch, though).
  • Object removal is quite tricky: you can only remove meshes by trimming the number of vertices and indices of the batch. However, you can overwrite meshes at a certain index.

For these reasons, it's only suitable for very specific use-cases (the bitmap font class, for example, uses a mesh batch internally). In those cases, it's definitely the fastest option, though. You won't find a more efficient way to render objects in Starling.

Use Bitmap Fonts

TextFields support two different kinds of fonts: True Type fonts and Bitmap Fonts.

TrueType fonts are easiest to use: simply embed the “ttf” file and you're done. For static text fields that contain hundreds of characters, they are a good and fast option. Starling will render the text into a bitmap and display the text just like a texture. For short texts that change repeatedly (e.g. a score display), this is too slow, though.

If your game needs to display texts with many non-ASCII characters (e.g. Chinese or Arabic), TrueType fonts may be your only option. Bitmap Fonts are simply limited by their texture size.

TextFields that use a Bitmap Font can be created and updated very fast. Another advantage is that they don't take up any additional texture memory except what's needed for the original texture. That makes them the preferred way of displaying text in Starling. My recommendation is to use them whenever possible.

Batch your TextFields

One more thing needs to be mentioned: per default, a TextField will require one draw call, even if your glyph texture is part of your main texture atlas. That's because long texts require a lot of CPU time to batch, making the additional draw call worth the effort.

However, if your text field contains only a few letters (rule of thumb: below 16), you can enable the batchable property on the TextField. With that enabled, your texts will be batched just like other display objects.

Find out if you need Mipmaps


Mipmaps are downsampled versions of your textures, intended to increase rendering speed and reduce aliasing effects.

Since version 2.0, Starling doesn't create any mipmaps by default. That turned out to be the preferable default, because that way:

  • Textures load faster.
  • They require less texture memory (just the original texture, no mipmaps).
  • Blurry images are avoided (mipmaps sometimes become fuzzy).

On the other hand, activating them will yield a slightly faster rendering speed when the object is scaled down significantly, and you avoid aliasing effects (i.e. the effect contrary to blurring). To enable mipmapping, use the corresponding parameter in the Texture.from… methods.

Load textures from files/URLs

The easiest way to include a texture in a game is to use the classic “[Embed]” syntax. Unfortunately, this approach wastes a lot of memory!

That's because the texture will be in memory twice: once as the embedded class that the runtime created for you, and once as the actual Starling texture.

To avoid this, do not embed your textures in the source, but instead load them from an URL or a file. Starling's AssetManager class makes this very easy. This is especially important on mobile devices, where memory is always a limiting factor.

Use BlendMode.NONE

If you've got totally opaque, rectangular textures, help the GPU by disabling blending for those textures. This is especially useful for large background images.

backgroundImage.blendMode = BlendMode.NONE;

Naturally, this will also mean an additional state change, so don't overuse this technique. For small images, it's probably not worth the effort (except if they'd cause a state change, anyway, for some other reason).

Use stage.color

If you use the actual stage color, it'll never actually be seen in your game — which will be the case in almost all games — always set it to clear black (0x0) or white (0xffffff). There's apparently a fast hardware optimization path for a context clear on some mobile hardware when “context.clear” is called with either all 1's or all 0's.

Some developers reported a full millisecond of spared rendering time per frame, which is a very nice gain for such a simple change!

public class Startup extends Sprite
    // ...

On the other hand, if the background of your game is a flat color, you can make use of that, too: just set the stage color to that value instead of displaying an image or a colored quad. Starling has to clear the stage once per frame, anyway — thus, if you change the stage color, that operation won't cost anything.

public class Startup extends Sprite
    // ...

Avoid repeated calls to ''width'' and ''height''

The width and height properties are more expensive than one would guess intuitively, especially on sprites (a matrix has to be calculated, then each vertex of each child will be multiplied with that matrix).

For that reason, avoid accessing them repeatedly, e.g. in a loop. In some cases, it might even make sense to use a constant value instead.

// bad:
for (var i:int=0; i<numChildren; ++i)
    var child:DisplayObject = getChildAt(i);
    if (child.x > wall.width)
// better:
var wallWidth:Number = wall.width;
for (var i:int=0; i<numChildren; ++i)
    var child:DisplayObject = getChildAt(i);
    if (child.x > wallWidth)

Make containers non-touchable

When you move the mouse/finger over the screen, Starling has to find out which object is hit. This can be an expensive operation, because it has to iterate over all of your display objects and call their hitTest method.

Thus, it helps to make objects “untouchable” if you don't care about them being touched, anyway. It's best to disable touches on complete containers: that way, Starling won't even have to iterate over their children.

// good:
for (var i:int=0; i<container.numChildren; ++i)
    container.getChildAt(i).touchable = false;
// even better:
container.touchable = false;

Hide objects that are outside the Stage bounds

Starling will send to the GPU any object that is part of the display list. This is true even for objects that are outside the stage bounds!

Now, why doesn't Starling simply ignore those invisible objects? The reason is that checking the visibility in a universal way is quite expensive. So expensive, in fact, that it's faster to send it up to the GPU and let it do to the clipping. The GPU is actually very efficient with that, and will abort the whole rendering pipeline very early if the object is outside the screen bounds.

However, it still takes time to upload that data, and you can avoid that. Within the high level game logic, it's often easier to make visibility checks (you can e.g. just check if the x/y coordinates are within the stage bounds (plus a small extra belt). If you've got lots of objects that are outside those bounds, it's worth the effort. Remove those elements from the stage or set their visible property to false.

Use the new Event Model

Beginning with Starling 1.2, there are new methods for event dispatching:

// classic way:
object.dispatchEvent(new Event("type", bubbles));
// new way:
object.dispatchEventWith("type", bubbles);

The new approach will dispatch an event object just like the first one, but behind the scenes, it will pool event objects for you. That means that you will save the Garbage Collector some work if you use the second technique. So it's less code to write and is faster — thus, it's the preferred way to dispatch events now. (Except if you've got a custom subclass of Event; those cannot be dispatched with that method.)

Next section: Custom Display Objects

  manual/performance_optimization.txt · Last modified: 2016/07/11 12:36 by daniel
Powered by DokuWiki