A while back I attempted to implement a technique called Geometry Clipmaps in Trial. This technique is used to stream in terrain data for scenery where the terrain is far too large to be stored in memory all at once, let alone be rendered at full resolution. In this entry I'll go over the idea of the technique and the problems I had implementing it.First, the technique is described in two papers: Geometry Clipmaps: Terrain Rendering Using Nested Regular Grids by Hoppe and Losasso [1], and Terrain Rendering Using GPU-Based Geometry Clipmaps by Hoppe and Asirvatham [2]. Neither of the papers offer any public source code, or much of any code at all to describe the details of the technique, especially in relation to the update mechanics. This is a problem, because a lot of the details of the papers are very strange indeed.
Before I get into that though, let me summarise the idea of the technique, which is fairly simple: since we cannot draw the full terrain, we need to reduce the detail somehow. Luckily for us, how much detail we can actually see in a rendered scene decreases with distance thanks to perspective skewing. Thus, the idea is that, the farther away the terrain geometry is from the camera, the coarser we render it.
To do this we create a square ring, whose inner sides are half the size of its outer sides. This gives us a power of two scaling, where we can simply render the same ring within itself over and over, scaling it down by two every time. Once we have sufficiently many rings, we simply fill the hole with a remaining square grid. Here's an image to illustrate the idea:
As you can see, the detail gets exponentially more coarse the farther away from the centre you go. If we look at this from the perspective of a camera that is properly centred, we can see the effect even better:
Despite the geometry becoming more coarse, it takes up a similar amount of actual pixels on screen. Now, the geometry I used here is rather simple. In the papers, they use a much more complex layout in order to save on vertices even more aggressively:
This, especially the interior trim, makes things a lot more painful to deal with. I'm quite sure the trim is a necessary evil in their scheme, as the ring is no longer a power of two reduction, but they still need the vertices to align properly. I can see no other reason for the trim to exist. Worse than that though, I can absolutely not understand why they specify four variants of the trim, one for each corner, when the largest trim is going to offset the rest of the geometry so much that even if every other ring uses the opposite trim, you cannot correct the shift enough to make things exactly centre again. If someone understands this, please let me know.
Anyway, now that we have these rings established, we need to look at how to actually render geometry with them. The idea here is to use height maps and dynamically shift the vertices' height in the vertex shader according to the brightness of the corresponding pixel in the height map. Since each ring has the same vertex resolution, we can use a fixed resolution texture for each level as well, saving on texture memory.
Let's look at an example for that. In order to generate the textures, we need a base heightmap at a resolution of
(n*2^(r-1))^2
, wheren
is the resolution of a clipmap, andr
is the number of rings. In this example I used a clipmap resolution of 64 and 4 levels, meaning I needed a base heightmap resolution of512^2
. For the innermost level you simply crop out the center at the clipmap resolution. For the next level you crop out twice as much, and then scale to half, and so on.The above are scaled up again using nearest-neighbour to make it better visible to the human eye. Displacing the rings by each corresponding map results in the following image. I've increased the clipmap resolution to 64 to match that of the textures.
Looks pretty good, eh? You might notice some issues arising on the edges between clipmap levels though. Namely, some vertices don't match up quite exactly. This is due to smoothing. Since each clipmap level has the same resolution, some of the pixels at the border are going to be too precise. In order to fix this we need to blend vertices as they get closer to the border. To do this we look at the distance of the vertex to the center and, depending on its magnitude, linearly blend between the current level and the next outer one. This means that for each clipmap level we draw, we need to have access to both the current level's heightmap and the next level's.
In the original paper they do this with an insane trick: they encode each heightmap value as a float and stuff the current level's height into the float's integer part, and the next level's into the fractional part. I don't know how this could ever be a good idea given floating point imprecision. I could understand stuffing both numbers into a single int, but this is crazy.
The way I do this in my current approach is different. I represent all the clipmap levels as a single 2D array texture, where each level in the texture represents one clipmap level. We can then easily access two levels, at the cost of two texture lookups of course. I could also see the encoding strategy working by limiting the depth to 16 bit and using 32 bit integer textures to stuff both levels into one. For now though, I'm not that concerned about performance, so optimisations like that seem folly.
Now, keep in mind that in order to figure out the proper smoothing, we need to get the exact position of the vertex of the inner clipmap in the texture of the outer clipmap. This is not that big of a deal if your clipmaps are perfectly aligned and centred, as you can simply scale by
0.5
. However, if you implement the clipmap geometry like they suggest in the paper, with the weird trim and all, now suddenly your geometry is slightly shifted. This caused me no end of grief and I was never able to get things to match up perfectly. I don't know why, but I suspect it's something about not doing the calculation quite exactly right, or something about the interpolation causing slight errors that nevertheless remain visible.I might rewrite the clipmaps using perfectly centred rings as I outlined above, just to see whether I can get it working perfectly, as I really cannot see that huge of an advantage in their approach. To reiterate, the reason why they do it they way they do is, according to my understanding, to find a good balance between number of vertices to keep in memory, and number of draw calls to issue to draw a single geometry layer. If I used perfectly centred rings I would have to draw either 12 smallish square blocks, or one big ring. This is opposed to their technique, which requires 14 draw calls and three different sets of geometry. Wait what? If that makes you confused, I'm with you. I don't understand it either.
Either way, once you have the interpolation between levels going you're almost there. You can now draw a static terrain with very low computational effort using geometry clipmaps. However, usually you want to have things move around, which means you need to update the clipmaps. And this is where I stopped, as I could not figure out what exactly to do, and could not find any satisfactory information about the topic anywhere else either.
First of all, in the papers they only use a map for the coarsest level layer, and predict the finer levels based on interpolation with a residual. This has two consequences: one, you need to perform a costly ahead of time calculation to compute the residual for the entire terrain. Two, I do not believe that this interpolatory scheme will give you sufficient amount of control over details in the map. It will allow you to save texture space, by only requiring the coarsest level plus a single residual layer, but again I do not believe that such extreme optimisations are necessary.
Second, they use a format called PCT (a precursor of JPEG, I believe) which allows region of interest decoding from a file, meaning you can only read out the parts you really need in order to update the clipmap textures. This is nice, but I'm not aware of any CL library that supports this format, or supports JPEG and allows region of interest decoding.
In order to explain the third problem, let's first look at what happens when we need to update the terrain. Say we have a clipmap resolution of
1cm
, meaning each grid cell is1cm
long at the finest level. The player now moves by+3cm
inx
. This means we need to shift the innermost layer by-3cm
inx
direction and load in the new data for that into the texture. We also need to update the next layer, as its resolution is2cm
, and we have thus stepped into the next grid cell for it as well. We don't need to step any of the other layers, as they are4cm
, and8cm
resolution respectively, and are too coarse for an update yet. This is fine because they are farther and farther away anyhow and we don't notice the lack of detail change. The edge interpolation helps us out here too as it bridges over the gaps between the layers even when they don't match up exactly.Now, the way they do this updating in the papers is by drawing into the clipmap texture. They keep a lookup position for the texture which keeps up with the camera position in the game world. When the camera moves they fill in the new detail ahead of the camera's movement direction by drawing two quads. The magic of this idea is that texture lookup happens with the
repeat
border constraint, meaning that if you go beyond the standard texture boundary, you wrap around to the other end and continue on. Thus you only ever need to update a tiny region of the texture map for the data that actually changed, without having to shift the rest of the data around in the texture. Since this is probably quite hard to imagine, here's another picture from their paper that illustrates the idea:The region “behind” the camera is updated, but because how the map is addressed happens through wraparound, it appears in front of it. Still, this is quite complicated, especially when you keep in mind that each level is, according to the paper's geometry, shifted by the trim and all. I'm also wondering whether, with my array texture technique, it would be faster to simply replace an entire clipmap texture level and save yourself the cost of having to first upload the two squares to textures, and then perform a draw call into the proper clipmap texture to update it. Since you know that each level has a fixed size, you could even allocate all the storage ahead of time and use a file format that could be directly memory mapped in or something like that, and then just use
glSubImage2D
to upload it all at once. This is just speculation and musing on my part, so I don't know whether that would be a feasible simplification on current hardware.So all in all I have my problems with how the update is described in the paper. There is another source for geometry clipmaps, namely this talk by Marcin Gollent about how terrain rendering works in REDEngine 3 for the Witcher 3. He only very, very briefly talks about the clipmaps (5:55-6:55), but it still confuses me to this day, especially the way the clipmaps are illustrated. Let me try to outline what I understand from it:
Each clipmap level has a resolution of 1024² vertices, and we have five levels in total. The map is divided up into tiles of a uniform size, each tile representing 512² vertices on the finest level, meaning you need four tiles to render the innermost level fully. So far so good, but my question is now: what happens if you need to shift, say, the fourth level clipmap by one vertex in
x
andy
? If each tile maps to a separate file on disk, it would mean we need to load in, at worst due to border conditions, 64 files. That seems like a whole lot to me. It would make more sense if you keep the resolution of a tile the same for each level, meaning you always need at worst 8 tiles to update a level. Either way, you end up with a tonne of files. Even just representing their map at the finest level would require46²=2116
files.I'm not sure if that's just me being flabbergasted at AAA games, or if there's something I truly don't understand about their technique. Either way, this approach, too, looks like a whole boatload of work to get going, though it does look more feasible than that of the original GPU clipmap paper. So far I haven't implemented either, as I've just been stuck on not knowing what to do. If there's something I'm legitimately not understanding about these techniques I'd rather not dive head first into implementing a strategy that just turns out to be stupid in the end.
I'd love to ask some people in the industry that actually worked on this sort of stuff to clear up my misunderstandings, but I couldn't for the life of me find a way to contact Marcin Gollent. I'm not surprised they don't put their email addresses up publicly, though.
If you have some ideas or questions of your own about this stuff, please do let me know, I'd be very interested in it. I definitely want to finish my implementation some day, and perhaps even develop some tooling around it to create my own heightmaps and things in Trial.
Written by shinmera