Some apps try use the D3DPERF_ functions for debug markers/annotations.
This utilizes the DXVK_RegisterAnnotation hidden functions to share the interfaces.
Co-authored-by: Oleg Kuznetsov <okouznetsov@nvidia.com>
Some games use UpdateSubresource to upload constant buffers in
between draws, so this path should be as fast as possible.
Also fixes a potential issue when using D3D11_COPY_NO_OVERWRITE
on deferred contexts, since the Map requirements don't hold here.
This causes some problems when the app uses a combination of index
buffer offset and StartIndexLocation that overflows 32-bit integers.
In my testing, there haven't been many games benefitting from this
optimization anyway, so just reverting it should not have tangible
effects on performance.
Do not rebind the buffer if only the offset changes. Instead,
adjust StartIndexLocation in indexed draw calls. For indirect
draws, this will be disabled on the fly.
This may save a whole bunch of work in the backend, and reduces
the number of commands being sent to the CS thread in the first
place, which is why this optimization is not being done in the
backend itself but rather on the client API side.
Trine 4 uses a stride of 32 bytes. Detecting the stride dynamically
allows us to merge a couple of draws in this game, and others which
do not tightly pack their draw parameter buffers.
SetConstantBuffers will only bind the first 65536 bytes of any
buffer passed to it if it is larger. This can be seen even when
querying the bound range via GetConstantBuffers1.
SetConstantBuffers1 does not have any effect if the bound range
is invalid.
Otherwise, a race condition occurs if a game submits rendering commands
at the same time as presenting the swap chain image. Only works if
multithreaded protection is enabled, but according to MSDN, it is
illegal to use DXGI commands and the immediate context in parallel.
Fixes stability issues in Tales of Vesperia.
ClearState gets used a lot in games that use deferred
contexts, so we should make sure it's fast. Since we
apply default state everywhere, there is no need to
perform any expensive RestoreState operations.
Reduces the amount of time spent on ClearState on the
CS thread by ~40%, and by ~90% on the calling thread.
Fixes incorrect behaviour in games that try to use a currently bound
UAV or render target as a shader resource at the same time.
Fixes visual artifacts in Shining Resonance Refrain on AMD hardware.