HyperZ
A
major problem that all game developers have to face when designing 3D worlds is
known as overdraw. To understand what overdraw is, consider a 3D deck of cards
sitting on a table. Most graphics processors have no way of knowing what card
will be closest to the viewer. They must render every card in the deck, then check
each pixel's depth in the Z buffer to see if it is visible to the viewer. Ultimately
the top card is rendered, determined to be closest to the viewer and is displayed.
This overwriting of pixels in the frame buffer is known as overdraw and causes
large amounts of memory bandwidth to be wasted.
A
measure of the amount of overdraw in a scene is called depth complexity, which
represents the ratio of total pixels rendered to visible pixels. For example,
if a scene has a depth complexity of 3, this means 3 times as many pixels were
rendered as were actually visible on the screen. This also means that 3 times
the fillrate is needed to display the scene at the same frame rate if there was
0 overdraw. But the problem is not only fillrate limitations caused by overdraw,
the real problem is the memory requirements associated with those wasted pixels.
For every pixel rendered, the processor must check the Zbuffer to see if it should
be written to the frame buffer. Assuming an average overdraw of 3, that not only
means we need triple the fillrate, we also require triple the memory bandwidth
to sustain a framerate. This is where the real limitations of today's graphics
processor is, even using 200MHz DDR memory, we simply cannot do not have enough
memory bandwidth. This makes the benefits of detecting and eliminating pixels
which the viewer will never see two fold, the fillrate requirements are lower,
and the associated memory bandwidth requirements are lower.
HyperZ
is ATI's method of detecting and removing those surfaces which the viewer
cannot see. It works by breaking up the scene into small tiles with an associated
Maximum Z and Minimum Z. This information is stored in on die cache, this
is why we don't have a memory bandwidth penalty for doing this. Then for
every triangle that is sent to the graphics processor, it's Z value is checked
against that stored in the HyperZ cache. If it is not visible to the viewer
it is tossed out, and if it is, the triangle is allowed to continue through
the rendering pipeline.
ATI's
estimates put the average overdraw at around 3 in today's games, but how effective
HyperZ technology is we don't know.
As
we've discussed, Zbuffer transfers make up a large portion of the memory bandwidth
requirements in current GPU's. HyperZ dealt with reducing some of those bandwidth
requirements, but to help further ATI has enabled a minor compression routine
between the GPU and memory. Affectionately called Z Compression, this is really
nothing too new, just fancy ATI marketing.
The last
component of HyperZ is known as Fast Z Clear. Traditionally when the GPU
is done rendering a scene it then goes back and writes 0's to the Z buffer
before starting on the next frame. ATI has developed a technique which allows
them to clear the Z buffer about 60 times faster than writing 0's.