Unity3D : Adventures in creating huge outdoor areas (terrains) — Part 3: Optimizing the darned trees (a bit)

Miroslav Martinovič
5 min readJul 21, 2020

We are slowly getting to the actual interesting parts, I promise!

So, last time, I verified that adding trees kills the performance, no matter what. The LODs makes it …survivable, but still.

This time, I decided to properly look into why that’s happening. Let’s start at the obvious place, profiler.

It was almost twice worse, these numbers are the result of what I’m describing in this article. It’s still bad, though, and shows the same point.

Would you look at that? Rendering is taking up 69ms of each frame… That is truly horrible. So the next step was Frame Debugger.

At this point, it might be useful to mention two main rendering modes. Forward and Deferred. You could write whole articles and tables about their workings and features and performance, and many people did write whole articles about that. I will not. I will condense it into two sentences:

Deferred rendering first draws all your objects on one canvas, then draws all your (pixel) lights on another canvas, then mixes those two together.

Forward rendering draws every object with every light.

In practice, in a scene with 100 objects and two pixel lights, Deferred rendering draws hundred objects (100 drawcalls, in the worst case), and then two lights (2 drawcalls), then joins the result in one last draw. That makes for 103 drawcalls. So you can estimate the number of drawcalls as: “number of objects PLUS number of lights”.

Forward rendering, in the same scene, draws the first object under the first light, first object under the second light, then second object under the first light, first object under the second light… And so on and so on. So you can estimate the number of drawcalls as “number of objects MULTIPLIED BY number of lights”. Which should be obviously horrible.

“OMG why would anyone use Forward rendering, then?”, I hear you scream in horror.

Deferred can’t do any other anti-aliasing than FXAA, and it can’t do transparency. Oh, and for simple scenes, the overhead is large. I’ve just saved you 5 minutes of googling, you’re welcome.

Okay, back to my terrain. I opened frame debugger and stepped through all 240 drawcalls, to find…

…what the hell? Why are all the trees being rendered in Forward mode, even though the whole project is set to Defferred?

Again, I will save you the half an hour of googling until I found out:

The answer was: “Because Deferred can’t do transparency, therefore it can’t do translucency either, and the beautiful Angry Mesh Winter Nature trees were using custom shader which used translucency, because… Because.”

And what Unity does in that case is, it draws everything it can in Deferred pass, and then everything else in Forward pass, and then composits them together.

So I opened the Angry Meshes’ shader, removed its translucency output, forced it to deferred, and… Yeah, now the trees are being done in Deferred pass… except… Only LOD0. LOD1 and 2, the ones which I need in deferred the most, are still in Forward. Oh wait, they have their own separate shader! So I go there, again, remove transparency/translucency, force it to Deferred mode, save, run… What is that?

Notice the cardboard prop

Investigation starts. The thing is, those billboards have “Alpha Cutout” material, and yes, I just said that Deferred can’t do transparency, but this is different. It can’t do transparency blending. But Alpha Cutout is just a hard limit saying “any pixel with alpha value below this one, ignore that pixel”. So this one is supposed to work in deferred.

What follows, what I have here in my notes, is a detailed step-by-step of about 20 steps that led me to realization and solution. However, 15 of those steps are just red herrings, so I’ll spare you, and shorten it down to two things:

I honestly still have no idea how that custom shader is even able to render the thing properly, since the channel I consider to be most important (Albedo, for example), wasn’t plugged into anything. What I suspect is that he’s doing some semi-high-level magic with calculating his custom lighting, and outputting the lighting data in such a way that it also contains the texture data itself. And for this reason, that billboard shader was set to use “custom lighting”, which it was calculating by itself, for itself. And this calculation didn’t like my messing around.

So I switched the light model from custom to standard, and removed the whole custom lighting calculation (and plugged in the Diffuse texture to the Albedo output, as it should have been in the first place).

Snow coverage and tint colors on the crosses will need some fixing now, but that’s fine

Funny, now that it renders correctly in scene and game, it renders incorrectly in material preview.

Whatever, it’s rendering in deferred as it should now, so I’m happy. And I think deleting the custom light calculation gave me very tiny slight perf improvement, like 0.5fps (from 15–16 to 16–17, from 62ms per frame to about 58). Honestly, even that was more than I expected, because I haven’t solved the more significant problem — the one of unity automatic batching and unity instancing is kind of crap. I’m using 6 types/prefabs of trees, each has 3 LODs, drawing the whole scene of them could theoretically take just 18 drawcalls. But it’s taking… a lot more, because Unity makes multiple tiny groups of the same model, instead of one large group.

And at this point, we are finally getting to the interesting bits.

Next time, we will start forcing the batching to actually work the way it is supposed to work. Because we want performance and we should be able to achieve it.

--

--

Miroslav Martinovič

Hypercreative hypoactive pessimist with his head up in the clouds, functioning brain, pert mouth/fingers, and no patience for morons and cultists.