28 août 2007

GCDC 2007 - Practical multi-threading for game performance

Leigh Davis (Intel) gave us a great presentation about what one should consider when writing multithreaded games. He was seconded by Doug Binks (Crytek) who explained us how multithreading is used in Crytek’s CryEngine2.

Preview of a future gamedev.net article.

In the coming years, Intel is going to push its multicore architecture to deliver even more performances: the Penryn is to come this year and next year will see the release of the Nehalem CPU – 4 cores with 8 hardware threads. The obvious goal of this new architecture is to make more powerful processors and to bypass the current limitation of the processor building technologies: there are limits to miniaturization and to high frequency handling.

But if you want to improve your game performances, you have to choose the right software architecture – and this is where programmers are in need for tips and education from the various chip makers.

If you want to take advantage of the most recent processor architecture, your software has to be designed accordingly:

  • You have to have a good understanding of how processor cache is handled in order to efficiently use it. This is especially true on multicore processors where all cores don’t share the same cache: if one core is modifying a data set and the other core is reading from it, both cache need to be synchronized and you lose much performance.
  • While speaking with the graphic driver, avoid calls that stalls – for example, calls that returns the GPU state.
  • You also need to have a firm grip on how your OS is handling threading. While it is sometimes handy to assign a particular thread to a core (for example, to avoid cache issues), one should be very careful about this.
  • A better scheduling of the operation that takes place in your game will lead to better performances. You shall try to not let a core idling waiting for another thread. Ideally, synchronization between thread should not take any time at all. Also, consider dependencies between operations.

CPU is an important resource for games – although it is true than in fact most games are not CPU bound. Even if a game is GPU bound, Leigh pointed out that the remaining CPU could be used to dramatically improve the quality of a game:

  • Better animation – adding cloth or hair simulation can add a lot of eye candy
  • The environment can be made “more destructible” – on low end systems, the environment is static and invincible; on high end systems, more physics can allow the environment to be fully destructible.
  • More complex particle systems – additional power can be used to initiate more complex behavior for particle systems

Crytek’s CryEngine2 makes use of both aspects: on the software design side, it takes the new architecture into consideration to improve the general engine performances. On the CPU use side, it makes use of the additional CPU time to implement better effects.

For example, CryEngine2 implements task level parallelism over these 6 key areas (not counting, of course, the functional decomposition of threads):

  • File streaming
  • Audio
  • Network
  • Shader compilation
  • Physics
  • Particle systems

The particle system task is a good example of how one can use the new CPU architectures to get the most of a computer. Not only particles are processed only when required but the system will also make speculative update (when the CPU is idling) in order to improve the overall performance of the particle system management. The particle count can be clamped to allow the game to run on lower end systems.

The conclusion is clear: a clever design and the respect of the peculiarity of multicore processors can have a tremendous effect on performance. Taking these points into account will allow your software to scale with the future n-cores processors that will hit the market in the coming years.

Commentaires

1. Le mercredi, décembre 12 2007, 19:53 par Rss

bonjour, j'ai teouvé ce billet fort intéressant :) je me demandais pourquoi ceette précision : "for example, to avoid cache issues" ... ;) je tesouhaite une bonne contniuation !

2. Le jeudi, décembre 13 2007, 12:40 par Emmanuel Deloget

Une explication se situe 4 lignes plus haut.

Dans le cas d'un processeur double coeur avec deux caches (comme les récents dual core d'Intel), si une thread sur un des coeur ecrit dans le cache et que l'autre thread sur l'autre coeur lit dans le cache, les lignes de caches doivent être mises à jour pour être synchronisé. Pour éviter cette perte de temps, il peut être utile de forcer les deux threads à s'exécuter sur le même coeur - ce qui évite d'avoir besoin de synchroniser les caches.

Ajouter un commentaire

Les commentaires peuvent être formatés en utilisant une syntaxe wiki simplifiée.

Fil des commentaires de ce billet