1
0
mirror of https://github.com/Yours3lf/rpi-vk-driver.git synced 2025-02-26 23:54:17 +01:00

Updated Broadcom Videocore IV Performance Recommendations (markdown)

Yours3lf 2020-06-18 19:52:38 +01:00
parent b1d259190d
commit b3359ce8f9

@ -1,11 +1,25 @@
Input assembly
### Profiling using hardware counters
Profiling can be done using standard Vulkan performance queries. See the query.cpp example.
Vertex shader
## Coordinate shaders
The Broadcom Videocore IV GPU has a hidden shader stage called the Coordinate Shader stage. This stage merely computes the final vertex positions. This makes sure that the GPU doesn't process vertex attributes for vertices that would be culled/clipped anyways. Therefore it is advised to supply vertex positions in a separate buffer so that the Coordinate Shader stage can achieve high cache efficiency.
The rest of the vertex attributes should be located interleaved in a separate buffer.
Fixed function stages
## Index buffers
Indexing can be used to make sure vertices are not processed redundantly. An index buffer optimizer library such as meshoptimizer should be used to make sure index buffers achieve maximum cache efficiency. See https://github.com/zeux/meshoptimizer
Rasterisation
## Vertex buffers
Choosing lower precision vertex attributes (8bit, 16bit) can save significant bandwidth, so choose a precision that suits your meshes.
Triangles that cover very few pixels (think less than 32) will be rasterized very inefficiently. Please make sure your vertices cover large enough screen area.
Fragment shader
## Tile based architecture
The Broadcom Videocore IV GPU is a tile based GPU (but not deferred) therefore it's important to sort your geometry front-to-back to avoid any unnecessary overdraw.
Blending
## ALU architecture
The Broadcom Videocore IV GPU has a dual-issue scalar FP32 ALU. This means that it can execute up to two instructions per cycle using its ADD and MUL ALUs. To maximize utilization it's important to fully saturate both ADD and MUL pipelines.
## Resolution
The Broadcom Videocore IV GPU is not really suited for 1080p resolution, therefore it's advisable to run at 720p to make sure the GPU is not overwhelmed with fragment work. This leads to a more balanced Vertex/Fragment workload and also a more balanced CPU/GPU workload.
## Clears
Use Load/Store operations to clear your textures. Any other method will likely result in a full-screen quad to clear parts or all of a texture.