top of page

MSc Dissertation / GPU Performance Analyzer
2017

This dissertation was submitted as part of my MSc in Computing studies, and resulted in my being awarded my degree with Distinction. It involves research and software development to determine effective performance optimizations for real-time 3D rendering systems. The measurements for the quantitative comparative analysis employed are gathered and processed by a testing environment specifically developed by myself for this study using C++ and DirectX.

• To download a PDF version of my dissertation, click here.

• To view a short screen-captured video demonstrating 3D viewport navigation inside the testing environment, click here.

 

Note: Do note that all the actual sampling occured from a strictly-defined camera position and orientation, which was kept constant for the duration of all measurements. This video is simply meant as a demonstration of the utilized 3D data set and underlying environment employed for the tests.

The study focuses on interactive 3D rendering pertaining to simulation, CAD software and video game development. It determines and implements optimizations that can be applied to a wide range of scenarios, so it primarily treats effective resource management in an exclusive GPU memory system. It addresses matters like the optimal number and structure of GPU buffers, 3D transformations, reducing overdraw, API draw calls, shader complexity, and pre-computing any data that can be guaranteed to be immutable for the application session.


After a series of tests and measurements to prove its position, the study concludes by suggesting:

• Use a frame limiter: By skipping unnecessary frames, much wasted computation is avoided and reserved for useful tasks. Application response time is significantly improved.

• Pre-transform immovable objects: By determining and explicitly distinguishing strictly immovable objects in the scene, all unnecessary overhead stemming from their per-frame 3D transformations is eliminated. These objects are instead pre-transformed to World Space either at asset creation time or at load time.

• Use collective GPU buffers: To reduce the overhead of re-binding render resources to the pipeline, create collective buffers and address their contents through indexing.

• Batch renderable objects and sort them in front-to-back order: Batching will minimize render context switching, while sorting will reduce overdraw.

• Use frequency of updates to determine constant buffer count: Constant buffers for DirectX cannot be partly updated, but necessarily update in full. To have the best combination of low update overhead and high flexibility in managing associated data, the data should be grouped into buffers based on its frequency of updates.

Also, whenever possible, make use of:

• Instancing: Batch-rendering via instancing reduces overall draw calls and associated CPU-GPU synchronization latency.

• LODs: If your project constraints allow their use, LOD chains are a worthwhile feature that will significantly lower the scene's polygon count.

• Object culling: Employ frustum culling and occlusion culling methods in order to skip rendering of non-visible objects, reduce the number of draw calls issued, and avoid their associated (and effectively wasted) overhead.

• Multi-threaded rendering: Modern graphics APIs allow increased multi-threaded flexibility while rendering. On a multi-processor system, much of the rendering process can now effectively take advantage of such concurrency.

Some of the above are also mentioned in in my 3D Engine Project features page.

Indicative Figures

Fig. 1 Diagrammatic overviews for portions of the optimized algorithm employed in the measurements.

Fig. 2 Indicative graphs from the performed analysis. Left: Framerate influence on cache misses and rendering speed. Right: Overview of the collective effect of the optimizations: 55.04% increase in rendering speed and responsiveness increase by 9637 times.

Fig. 3 Analyzing overdraw on the employed 3D data set, demonstrating why rendering in a back-to-front order is suboptimal.

Fig. 4 Demonstrating the effect of instancing-induced overdraw.

Antonios Gogios - MSc Computing 2017 - Viewport

Antonios Gogios - MSc Computing 2017 - Viewport

Viewport video

Fig. 5 Short screen-captured video demonstrating 3D viewport navigation inside the developed testing environment.

Copyright © 2026 Antonios Gogios. All rights reserved.

bottom of page