There are several scenarios when flushing can have
a significant negative impact on performance:
1. When the query result is already available
2. When the game scatters GetData calls when rendering
Frostpunk hits both issues at the same time, which led to
over 120 queue submissions per frame. This patch reduces
that to 3 submissions per frame when the game is GPU-bound.
There currently doesn't seem to be a game which actually renders
to images with linear tiling, but we should handle this anyway.
Only the GENERAL layout is allowed if the tiling is not OPTIMAL.
We don't need to force layout transitions and emit double pipeline
barriers in case the GENERAL layout is being used for both images.
This is somewhat common for images used by compute shaders, and
this optimization ensures that only required barriers are emitted.
Instead of inserting a barrier after every single buffer copy, update
or clear operation, we batch them up and execute the barrier when the
first dirty buffer is used by a command. This significantly reduces
the number of pipeline barriers in some games, e.g. Final Fantasy XV.
Spilling the render pass should make shader storage buffer/image writes
visible due to how external subpass dependencies are defined. For UAV
rendering, we need to do this when changing the UAVs, even if the render
targets themselves do not change.
When an applicationn calls this method with the width or height
set to 0, we are allowed to pick any resolution, so we'll try to
find one close to the *current* display mode which usually returns
the current display mode itself.
Since the atomic operations always return the old value, we have to
subtract one for the consume instruction. The append instruction is
unaffected. Fixes an issue with vegetation in Final Fantasy XV.
DXVK does not support device-specific counters, which seem to
be useful only for GPU profiling during development, but we
should report this properly to the application.
It is illegal to call this method on a deferred context, so we should
filter out those calls. This allows the implementation to make use of
features specific to the immediate context.
This reverts commit 83ae39f727.
Does not show any considerable advantage over the 16 MiB chunk size
and reduces the effectiveness of the host-visible device-local memory
type on AMD cards.
Works around an issue with some games not setting the D3D11 depth
bias state correctly, which can result in an excessive number of
pipelines being compiled.