Advertisement

Culling, occlusion?

Started by August 18, 2024 06:21 PM
6 comments, last by cozzie 3 weeks ago

Hi all,

I’m trying to lower my frametimes while in this case rendering a relatively large city in my engine, with roughly 3500 objects. So far I have a quadtree spatial division which helps quite a bit, but there’s a lot of occurrences where big occluders (buildings) block sight of a lot of objects I’m still processing.

My idea/ goal is to find a way to first “render” the aabb’s of my big occluders, within the cells that are potentially visible in the view (frustum), and then define which cells (using their aabb’s) would still be visible, so I only have to process these cells in the following render passes.

I think for this I need some form of occlusion culling, either with dx11 occlusion queries, maybe software depth buffer or some other solution. I’m also wondering if there’s a way to achieve this to a d3d11 offscreen depth buffer, but I don’t know how to get results back from it, for future processing in my code (CPU).

Any ideas on how to approach this?

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

cozzie said:
My idea/ goal is to find a way to first “render” the aabb’s of my big occluders

Needs a correction, since taking this literally would not work.

You can not use a bounding volume of an occluder, since the bound covers pixels the occluder does not.
You can only use a 'smaller' shape, which misses some pixels the occluder covers, to guarantee we do not cull accidentally objects which are visible.

This is important, and said problem makes it hard for example to run a occlusion system at lower resolution than the screen. Even if we use software rasterization, it's not trivial to guarantee we only render pixels which are fully covered, while it is easy to guarantee rendering all partially covered pixels ('conservative rasterization' on GPU).
The problem is for example shared edges of occluder polygons. They are not fully covered by either polygon of the edge, so we don't draw them, which causes many holes and the culling becomes almost ineffective.

To avoid the problem, we can try to ‘shrink’ to occluders, so they are smaller then the geometry, and so the potential error of single pixels is no problem in practice.

Or we can render at the same resolution as the actual screen. (which you maybe intend anyway)

Or we could work with portals instead occluders, where low resolution causes no accidental culling.

Coming back to ‘AABB does not work', this means you have to create the occluder geometry somehow. E.g. modelling a low detail box representation of your scene, manually or automated.

cozzie said:
maybe software depth buffer

Personally i did it with software rasterization on CPU.
A full depth buffer is not needed, drawing (or processing) of single pixels is not needed.
Instead we can use 'spans'. (basically a horizontal scanline defined by two points. Newly inserted spans may intersect and modify existing ones.)
Or we could use no frame buffer at all, but just polygon clipping.
But spans scale better with detail, and it's also easier to implement.

cozzie said:
So far I have a quadtree spatial division which helps quite a bit

I use a octree, containign both the occluder geometry and the AABBs of the visible gemetry.
The tree is traversed so nodes are roughly processed in front to back order.
Occluders are drawn as spans. AABBs are checked for visibilty with the spans. If they are occluded, the entire subtree can be skipped. This means no more need to draw any occluders or geomtry, so it's very work efficient compared to other solutions. It also supports dynamic occluders, although for that using BVH would be faster than octree.

Downsides are: The algorithm can't be parallelized effectively, so it's no good fit for GPU.
So i'm not really happy with it, and will look for alternatives. It's also a lot of work, so i'm not sure if i recommend a CPU solution.

Regarding modern GPU implementations, this seems good to me:

https://medium.com/@mil_kru/two-pass-occlusion-culling-4100edcad501

Advertisement

Thanks. I’ll see if I can figure out how it works with the “spans” / creating a software/CPU depth buffer to draw the minimal smooth filled bounding boxes of the occluders. And then see if I can figure out which cells are still visible or not

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

I'll add a video of implementation of gpu-based occlusion culling:

It might seem confusing at first (red lines define frustum planes - these are to test frustum culling, in wireframe you can see occlusion testing.

This works as following:

  1. Start with list of geometries to render
  2. Do a frustum culling - output only geometries that pass frustum test
  3. Do a fast render of scene into z-buffer
  4. Create hierarchical z-buffer
  5. Do occlusion culling - output only geometries that pass occlusion test
  6. Render

This one is done entirely on gpu (with introduction of work graphs, it could be even more performant that what it is). The downside of occlusion culling is - in F.e. this scene it's not going to give you much (and may cost you more than what it gives!).

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

Nice, thanks. I’ll see if I can figure out how to implement these steps. Especially the hierarchical Z buffer is new to me.

To what do you render in step 1? Is that just depth buffer? In my current setup, frustum culling (and aabb checking of my quadtree partitions) is the first thing I do. I believe that should stay the same. So something like:

  • define PVS, aka cells/partitions that are potentially visible
  • Render the occluders within those cells to something/ buffer, no pixel shading
  • Then somehow figure out which of the aabb’s (so which cells) would still be visible with the occluders in place
  • With that information, continue normal rendering with just the objects in the filtered PVS set of cells, after checking them against occluders

The part I don’t understand yet, is step 2 and 3. Do I need occlusion queries for that, or can the depth buffer provide this, or something else?

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

cozzie said:
Do I need occlusion queries for that

Not for any of our proposals, but occlusion queries are surely the easiest way to do it, so i would try it first to see if it's already good enough. It could work like this:

Select a set of nearby and large occluders, e.g. buildings. (Ideally you would have low poly versions, e.g. ‘shrinked’ boxes. But bounding boxes won't work.)

Render those occluders to the depth buffer.

For each object do an occlusion query using the bounding box of the object. If it's visible, render it, otherwise skip it. (Probably one query for each single object is too slow, and you have to make bounding boxes for groups of multiple objects. E.g. all objects of a quadtree node.)

This should be pretty simple and little work. No software rendering, no HZB, etc.
Notice again: You can use bounding volumes to test for visibility, but you can not use bounding volumes to approximate occluders. The reason: It's ok to draw some stuff although it's not visible. But it's not ok to cull some stuff which should be visible.

Personally i have never used occlusion queries, but afaik those are the problems with it:
It's brute force, since it processes all pixels of the queried geometry even if the first pixel is already visible. (Thus you want to minimize the number of queries by using sets of multiple objects per query, i guess.)
Getting the query results to CPU causes latency. (But with GPU driven rendering there is no need to involve the CPU.)

Advertisement

All clear. So I’ll try to implement it like this to see what the results are:

  • on cpu side define which cells/ partitions are visible
  • Clear depthbuffer
  • Draw only the occluders within these cells just to the depth buffer, by using simple solid filled boxes to represent the occluders
  • Initiate an occlusion query for the visible cells, represented also by solid filled boxes, representing the cells aabb’s
  • Take the cells that are returned visible from the queries and do the rest of the rendering based on just these cells (all objects inside them)

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

Advertisement