A tale as old as time - CPU bottlenecking
As described in my earlier post āMy plan to render 1 million treesā, my game has a grid system of sprites; each one displays a small chunk of the world as rendered by the SubViewport. But because the viewport moves around, we canāt simply just set each sprite to have a ViewportTexture
as this would mean they all would display the same thing.
I recently implemented an LOD system so that as you zoom in and out the game renders in more or less detail. I noticed as you zoomed in, the FPS really dropped as I increased the amount of detail to a level which I wanted.
What is interesting is this wasnāt because of the number of trees being rendered, or really anything to do with the objects in the scene, but it was actually from two lines of code:
var image := $Subviewport.get_texture().get_image()
var texture := ImageTexture.create_from_image(image)
These lines would grab what the viewport is currently seeing, and the create an image texture to set on the sprite.
Under the hood get_texture().get_image()
appears to mainly be calling RenderingDevice::texture_get_data
. Depending on the texture, this will call on the graphics driver (in my case, Vulkan) to give it a readable pointer to the textures data, where it will then read the data into memory and return an array of bytes which get put into an image resource.
Clearly, this is CPU bottlenecked, because it has to read from GPU to CPU to get the image, and then straight back to the GPU via an ImageTexture. You can see the processing times scale pretty badly:
Viewport size (square) | Processing time |
---|---|
64 | 0.4ms |
1024 | 33 ms |
2048 | 89 ms |
(Note, I this is not a scientific test, I only ran it a few times) |
For reference, 60fps is 1 frame every 16ms, so anything greater than this is less than ideal.
The fix
After digging around a bit in the rendering code of Godot, I found this solution, which seems to remove the CPU from the operation entirely:
var rd := RenderingServer.get_rendering_device()
var imagesize := Vector2i(1024,1024)
# Get the texture RID and the texture format.
var in_texture = RenderingServer.texture_get_rd_texture($Subviewport.get_texture().get_rid(), false)
var in_tex_format = rd.texture_get_format(in_tex)
# Make a format to store the output based on the input texture.
# We need TEXTURE_USAGE_CAN_COPY_TO_BIT added so we can copy from the viewport to this one
var out_tex_format = in_tex_format
out_tex_format.usage_bits = out_tex_format.usage_bits | RenderingDevice.TEXTURE_USAGE_CAN_COPY_TO_BIT
# Create the output texture.
var out_texture = rd.texture_create(out_tex_format, RDTextureView.new(), [])
# Copy from the viewport texture to the output texture.
rd.texture_copy(in_texture, out_texture, Vector3.ZERO, Vector3.ZERO, Vector3(image_size.x, image_size.y, 0), 0, 0, 0, 0)
# Create a texture that can be applied to a sprite.
var sprite_tex = Texture2DRD.new()
sprite_tex.texture_rd_rid = out_texture
# Use the texture....
# After the resource is cleared, you may need to manually free the texture.
# It won't be cleared manually, because we created it directly ourselves.
# Wait at least 1 frame after it is not used.
await get_tree().process_frame
rd.free_rid(out_texture)
How this works
A note about terminology
Thereās a lot of obscure terminology, especially for someone who doesnāt have much experience in 3D rendering pipelines and drivers (like myself circa 1 hour ago). Iāll try to explain it simply, also, iāll probably make a few mistakes as well.
First lets make something clear about textures. There are two main things textures might refer to:
The first group Iāll call āResourceā textures - these are things like Texture2D
, ImageTexture
, NoiseTexture2D
, etcā¦ You would interact with these often and put them onto things like Materials, or save them to disk. These are high level.
The second group Iāll call āRenderingDeviceā textures - these are lower level, essentially what āResourceā textures link to under the hood. These are closer to the graphics driver.
Also if you werenāt aware, a lot of things in Godot internally are reference by a RID
or Resource ID. Think of this like a pointer.
Breaking it down
RenderingServer.texture_get_rd_texture($Subviewport.get_texture().get_rid(), false)
Remember the types of textures? They come into play here:
Subviewport.get_texture()
- This returns aViewportTexture
, a subclass ofTexture2D
. This is the āresourceā texture I mentioned about before.- We get the viewport textureās RID, basically the internal resource ID of the texture.
RenderingServer.texture_get_rd_texture
Looks up this āresourceā texture ID, and gives us the RID of a āRendering deviceā (RD) texture. This is the low level RID of the texture thats actually on the GPU.
var in_tex_format = rd.texture_get_format(in_tex)
var out_tex_format = in_tex_format
out_tex_format.usage_bits = out_tex_format.usage_bits | RenderingDevice.TEXTURE_USAGE_CAN_COPY_TO_BIT
var out_texture = rd.texture_create(out_tex_format, RDTextureView.new(), [])
The format tells the GPU how the data is structured (i.e. how many bytes to expect and where) and how it can be used.
We need to add TEXTURE_USAGE_CAN_COPY_TO_BIT
to the output texture, since we will be copying the data to it. These are bitwise operations, hence the |
.
Then we just make a new texture directly on the GPU (we bypass Godots āresourceā texture layer here) via the Rendering device.
rd.texture_copy(in_texture, out_texture, Vector3.ZERO, Vector3.ZERO, Vector3(image_size.x, image_size.y, 0), 0, 0, 0, 0)
Now for the good stuff! This is a neat little function that instructs the GPU to copy from the input texture (the viewports texture) to our new output texture. This means the actual texture data doesnāt ever really touch the CPU.
Note the Vector3
might seem a bit odd, but they are there for use with 3D textures (we donāt care about the 3rd value, so its just zero here)
var sprite_tex = Texture2DRD.new()
sprite_tex.texture_rd_rid = out_texture
Now we have the texture copied, but itās only stored at the low level. We need a way to use this with everyday Godot things like Sprites. Luckily, in 4.2 Texture2DRD
was added. This allows us to map a texture created on the RenderingDevice and link it with a normal texture that we can use in Materials and the like.
await get_tree().process_frame
rd.free_rid(out_texture)
After using the texture and we are done with it, we need to clean it up manually otherwise Godot will keep the texture in the GPU and the memory will slowly fill until your GPU dies. Waiting a frame stops currently in use objects (who might be visible currently but will disappear in the next frame) not āflickerā because they lost their texture, since free_rid
happens immediately.
The results
Viewport size (square) | Processing time via get_image | Processing time via texture_copy |
---|---|---|
64 | 0.4ms | 0.006ms |
1024 | 33 ms | 0.008ms |
2048 | 89 ms | 0.011ms |
The results are quite impressive. Clearly because the GPU is able to parallelise the copy, it scales a lot slower than the CPU bound version (which would theoretically scale linearly by the number of pixels).