Performance Optimization

While Starling mimics the classic display list of Flash, what it does behind the scenes is quite different. To achieve the best possible performance, you have to understand some key concepts of its architecture. Here is a list of best practices you can follow to have your game run as fast as possible.

General AS3 Tips

Always make a Release Build

The most important rule right at the beginning: always create a release build when you test performance. Unlike conventional Flash projects, a release build makes a huge difference when you use a Stage3D framework. The speed difference is immense; depending on the platform you're working on, you can easily get a multiple of the framerate of a debug build.

  • In Flash Builder, release builds are created by clicking on “Project - Export Release Build”.
  • In Flash Develop, choose “Release” configuration and build the project, then choose “ipa-ad-hoc” or “ipa-app-store” option when you execute the “PackageApp.bat” script.

Check your Hardware

Be sure that Starling is indeed using the GPU for rendering. That's easy to check: if Starling.current.context.driverInfo contains the string “Software”, then Stage3D is in software fallback mode, otherwise it's using the GPU.

Furthermore, some mobile devices can be run in a “Battery Saving Mode”. Be sure to turn that off when making performance tests.

Use Adobe Scout

Adobe published an amazing performance profiling tool called “Scout”. It's very lightweight and easy to use: just compile your game with the “-advanced-telemetry” flag, then open up Scout before launching your game. It works even on mobile devices!

Decode Loaded Images Asynchronously

By default, if you use a Loader to load a PNG or JPEG image, the image data is decoded when it's used to create a Texture. This happens on the main thread and can cause your app to block when creating textures from loaded images. Try setting the image decoding policy flag to ON_LOAD, which will decode the image in the Loader's background thread. Starling's AssetManager already uses this ON_LOAD technique for images it loads.

loaderContext.imageDecodingPolicy = ImageDecodingPolicy.ON_LOAD;
loader.load(url, loaderContext);

Find out more about this topic in the LoaderContext documentation.

Syntax Tips

ActionScript 3 contains a few 'gotcha's that degrade performance and are best avoided. Here are the most important you should know about:

Loops

Avoid “for each”. It's faster to use the classic “for i”. Furthermore, beware that the loop condition is executed once per loop, so it's faster to save it into an extra variable.

// slow:
for each (var item:Object in array) { ... }
 
// better:
for (var i:int=0; i<array.length; ++i) { ... }
 
// fastest:
for (var i:int=0, l:int=array.length; i<l; ++i) { ... }
Avoid Object Creation

Avoid creating a lot of temporary objects. They take up memory and need to be cleaned up by the Garbage Collector, which might cause small hiccups when it's running.

// bad:
for (var i:int=0; i<10; ++i)
{
    var point:Point = new Point(i, 2*i);
    doSomethingWith(point);
}
 
// better:
var point:Point = new Point();
for (var i:int=0; i<10; ++i)
{
    point.setTo(i, 2*i);
    doSomethingWith(point);
}
Accessing Array or Vector Elements

When you retrieve an element of a vector or array, be careful: when the element index is the result of a calculation, always cast it to int. For some reason, AS3 can access the element faster, that way.

// bad:
var element:Object = array[10*x];
 
// better: (even though 'x' is an integer!)
var element:Object = array[int(10*x)];

Starling Specific Tips

Minimize State Changes

As you know, Starling uses Stage3D to render all visible objects. This means that all drawing is done by the GPU.

Now, Starling could send one quad after the other to the GPU, drawing one by one. In fact, this is how the very first Starling release worked. However, for optimal performance, GPUs prefer to get a huge pile of data and draw all of it at once.

That's why newer Starling versions batch as many quads together as possible before sending them to the GPU. However, it can only batch quads that have similar properties. Whenever a quad with a different “state” is encountered, a “state change” occurs, and the previously batched quads are drawn.

I use “Quad” and “Image” synonymously in this article. Remember, Image is just a subclass of Quad that adds a texture.

These are those crucial properties that make up a state:

  • The texture (different textures from the same atlas are fine, though)
  • The blendMode of display objects
  • The smoothing value of images
  • The repeat mode of textures
  • The tinted property of quads (see below)

If you set up your scene in a way that creates as little state changes as possible, your rendering performance will profit immensely.

Tinted Quads

Some mobile hardware (e.g. the 1st iPad) has a hard time “tinting” textures, that is:

  • Drawing them translucently (alpha is not one)
  • Drawing them with a different color (setting image.color to something else than white)

For this reason, Starling optimizes the rendering code of untinted images. The downside: switching between tinted and untinted objects will cause a state change. Keep that in mind when you change an image's color or alpha value.

If you are creating a game for hardware that doesn't care about tinting, those state changes will degrade your performance needlessly.

There's a simple trick to avoid the state changes then: just set the alpha value of your root object to “0.999” or a similar value. Since the alpha value propagates down to the children on rendering, Starling will now treat every object as tinted, and no more state changes (at least not from the color or alpha properties) will be triggered.

The Statistics Display

Starling's statistics display, which you can activate by enabling starling.showStats, shows you the number of draw calls that are executed per frame (third line: DRW). The more state changes you have, the higher this number will be. Your target should always be to keep it as low as possible. The following tips will show you how.

N.B. Starling explicitly decrements the draw count displayed to take into account the stats display being used.

The Painter's Algorithm

To know how to minimize state changes, you need to know the order in which Starling processes your objects.

Like Flash, Starling uses the Painter's algorithm to process the display list. This means that it draws your scene like a painter would do it: starting at the object at the bottom layer (e.g. the background image) and moving upwards, drawing new objects on top of previous ones.

If you'd set up such a scene in Starling, you could create three sprites: one containing the mountain range in the distance, one with the ground, and one with the vegetation. The mountain range would be at the bottom (index 0), the vegetation at the top (index 2). Each sprite would contain images that contain the actual objects.

On rendering, Starling would start at the left with “Mountain 1” and continue towards the right, until it reaches “Tree 2”. Now, if all those objects have a different state, it would have to make 6 draw calls. If you load each object's texture from a separate Bitmap, this is what will happen.

Another tool at your disposal is the DisplayObjectContainer::sortChildren() method which can be used to sort layers, within a Sprite object for example, based on properties such as x, y, alpha etc. The method accepts a compare function which means you can sort objects based on any criteria you wish! :-D

Texture Atlas

That's one of the reasons why texture atlases are so important. If you load all those textures from one single atlas, Starling will be able to draw all objects at once! (At least if the other properties listed above do not change.)

Here, each image uses the same atlas (depicted by all nodes having the same color). The consequence of this is that you should always use an atlas for your textures.

Sometimes, though, not all of your textures will fit into a single atlas. An atlas should not be bigger than 2048×2048 pixels (this is the maximum texture size on some mobile hardware), so you'll run out of space sooner or later. But this is no problem — as long as you arrange your textures in a smart way.

Both those examples use two atlases (again, one color per atlas). But while the display list on the left will force a state change for each object, the version on the right will be able to draw all objects in just two batches.

Flattened Sprites

By minimizing state changes, you have already done a lot for the performance of your game. However, Starling still needs to iterate over all your objects, check their state, and then upload their data to the GPU — on each frame!

This is where the next optimization step comes into play. If there's a part of your game's geometry that is static and does not (or only rarely) change, call the flatten method on that sprite. Starling will preprocess the children and upload their data to the GPU. On each of the following frames, it will now be able to draw them right away, without any additional CPU processing, and without having to upload new data to the GPU.

This is a powerful feature that can potentially reduce the burden on the CPU immensely. Just keep in mind that even flattened sprites suffer from state changes: if the geometry of a flattened sprite contains different render states, it will still be drawn in multiple steps.

The QuadBatch class

Flattened sprites are very fast and easy to use. However, they still have some overhead:

  • When you add objects to a sprite, they will dispatch ADDED and ADDED_TO_STAGE events, which can be some overhead if there are lots of children.
  • As any display object container, you can add any child only once.

To get rid of these limitations as well, you can go down to the low-level class that Starling uses for all the batching internally: QuadBatch. It works like this:

var quadBatch:QuadBatch = new QuadBatch();
var image:Image = new Image(texture);
quadBatch.addImage(image);
 
for (var i:int=0; i<100; ++i)
{
    image.x += 10;
    quadBatch.addImage(image);
}

Did you notice? You can add the same image as often as you want! Furthermore, adding it won't cause any event dispatching. As expected, this has some downsides, though:

  • All the objects you add must have the same state (i.e. use textures from the same atlas). The first image you add to the QuadBatch will decide on its state. You can't change the state later, except by resetting it completely.
  • You can only add instances of the Image, Quad, or QuadBatch class.
  • It's a one-way road: you can only add objects. The only way to remove an object is to reset the batch completely.

For these reasons, it's only suitable for very specific use-cases (the bitmap font class, for example, now uses quad batches directly). In those cases, it's definitely the fastest option, though. You won't find a more efficient way to render objects in Starling.

Use Bitmap Fonts

TextFields support two different kinds of fonts: True Type fonts and Bitmap Fonts.

TrueType fonts are easiest to use: simply embed the “ttf” file and you're done. For static text fields that contain hundreds of characters, they are a good and fast option. Starling will render the text into a bitmap and display the text just like a texture. For short texts that change repeatedly (e.g. a score display), this is too slow, though.

If your game needs to display texts with many non-ASCII characters (e.g. Chinese or Arabic), TrueType fonts may be your only option. Bitmap Fonts are simply limited by their texture size.

TextFields that use a Bitmap Font can be created and updated very fast. Another advantage is that they don't take up any additional texture memory except what's needed for the original texture. That makes them the preferred way of displaying text in Starling. My recommendation is to use them whenever possible.

Batch your TextFields

One more thing needs to be mentioned: per default, a TextField will require one draw call, even if your glyph texture is part of your main texture atlas. That's because long texts require a lot of CPU time to batch, making the additional draw call worth the effort.

However, if your text field contains only a few letters (rule of thumb: below 16), you can enable the batchable property on the TextField. With that enabled, your texts will be batched just like other display objects.

Find out if you need Mipmaps

mipmap.jpg

Mipmaps are downsampled versions of your textures, intended to increase rendering speed and reduce aliasing effects.

Per default, Starling creates them for you when you load a texture (through the Texture.fromBitmap[Data] methods). In most cases, this is a good thing: it will increase rendering speed when the texture is scaled down, and you avoid ugly aliasing effects.

In some cases, however, it makes sense to disable mipmaps:

  • To make texture loading faster.
  • To avoid blurry images. Certain scale factors will make your images look a little blurry; without mipmapping, they will look sharper. (Alternatively, try to use TextureSmoothing.TRILINEAR on those objects.)

In a nutshell, disabling mipmaps will make your game faster when loading texture; but slowed down rendering speed when the texture is scaled down. To disable mipmapping, use the corresponding parameter in the Texture.fromBitmap[Data] methods.

Load textures from files/URLs

The easiest way to include a texture in a game is to use the classic “[Embed]” syntax. Unfortunately, this approach wastes a lot of memory!

That's because the texture will be in memory twice: once as the embedded class that the runtime created for you, and once as the actual Starling texture.

To avoid this, do not embed your textures in the source, but instead load them from an URL or a file. Starling's AssetManager class makes this very easy. This is especially important on mobile devices, where memory is always a limiting factor.

Use BlendMode.NONE

If you've got totally opaque, rectangular textures, help the GPU by disabling blending for those textures. This is especially useful for large background images. Don't be afraid of the additional state change this will cause; it's worth it!

backgroundImage.blendMode = BlendMode.NONE;

Use stage.color

If the background of your game is a flat color, set the stage color to that value instead of adding a texture or a colored quad. Starling has to clear the stage once per frame, anyway — thus, if you change the stage color, that operation won't cost anything. There is such a thing as a free lunch, after all!

[SWF(backgroundColor="#ff2255")]
public class Startup extends Sprite
{
    // ...
}

Avoid repeated calls to ''width'' and ''height''

The width and height properties are more expensive than one would guess intuitively, especially on sprites (a matrix has to be calculated, then each vertex of each child will be multiplied with that matrix).

For that reason, avoid accessing them repeatedly, e.g. in a loop. In some cases, it might even make sense to use a constant value instead.

// bad:
for (var i:int=0; i<numChildren; ++i)
{
    var child:DisplayObject = getChildAt(i);
    if (child.x > wall.width)
        child.removeFromParent();
}
 
// better:
var wallWidth:Number = wall.width;
for (var i:int=0; i<numChildren; ++i)
{
    var child:DisplayObject = getChildAt(i);
    if (child.x > wallWidth)
        child.removeFromParent();
}

Make containers non-touchable

When you move the mouse/finger over the screen, Starling has to find out which object is hit. This can be an expensive operation, because it has to iterate over all of your display objects and call their hitTest method.

Thus, it helps to make objects “untouchable” if you don't care about them being touched, anyway. It's best to disable touches on complete containers: that way, Starling won't even have to iterate over their children.

// good:
for (var i:int=0; i<container.numChildren; ++i)
    container.getChildAt(i).touchable = false;
 
// even better:
container.touchable = false;

Hide objects that are outside the Stage bounds

Starling will send to the GPU any object that is part of the display list. This is true even for objects that are outside the stage bounds!

Now, why doesn't Starling simply ignore those invisible objects? The reason is that checking the visibility in a universal way is quite expensive. So expensive, in fact, that it's faster to send it up to the GPU and let it do to the clipping. The GPU is actually very efficient with that, and will abort the whole rendering pipeline very early if the object is outside the screen bounds.

However, it still takes time to upload that data, and you can avoid that. Within the high level game logic, it's often easier to make visibility checks (you can e.g. just check if the x/y coordinates are within the stage bounds (plus a small extra belt). If you've got lots of objects that are outside those bounds, it's worth the effort. Remove those elements from the stage or set their visible property to false.

Use the new Event Model

Beginning with Starling 1.2, there are new methods for event dispatching:

// classic way:
object.dispatchEvent(new Event("type", bubbles));
 
// new way:
object.dispatchEventWith("type", bubbles);

The new approach will dispatch an event object just like the first one, but behind the scenes, it will pool event objects for you. That means that you will save the Garbage Collector some work if you use the second technique. So it's less code to write and is faster — thus, it's the preferred way to dispatch events now. (Except if you've got a custom subclass of Event; those cannot be dispatched with that method.)


Next section: Custom Display Objects

  manual/performance_optimization.txt · Last modified: 2014/07/01 14:38 by 195.62.218.14
 
Powered by DokuWiki