This causes some problems when the app uses a combination of index
buffer offset and StartIndexLocation that overflows 32-bit integers.
In my testing, there haven't been many games benefitting from this
optimization anyway, so just reverting it should not have tangible
effects on performance.
Do not rebind the buffer if only the offset changes. Instead,
adjust StartIndexLocation in indexed draw calls. For indirect
draws, this will be disabled on the fly.
This may save a whole bunch of work in the backend, and reduces
the number of commands being sent to the CS thread in the first
place, which is why this optimization is not being done in the
backend itself but rather on the client API side.
Trine 4 uses a stride of 32 bytes. Detecting the stride dynamically
allows us to merge a couple of draws in this game, and others which
do not tightly pack their draw parameter buffers.
SetConstantBuffers will only bind the first 65536 bytes of any
buffer passed to it if it is larger. This can be seen even when
querying the bound range via GetConstantBuffers1.
SetConstantBuffers1 does not have any effect if the bound range
is invalid.
Otherwise, a race condition occurs if a game submits rendering commands
at the same time as presenting the swap chain image. Only works if
multithreaded protection is enabled, but according to MSDN, it is
illegal to use DXGI commands and the immediate context in parallel.
Fixes stability issues in Tales of Vesperia.
ClearState gets used a lot in games that use deferred
contexts, so we should make sure it's fast. Since we
apply default state everywhere, there is no need to
perform any expensive RestoreState operations.
Reduces the amount of time spent on ClearState on the
CS thread by ~40%, and by ~90% on the calling thread.
Fixes incorrect behaviour in games that try to use a currently bound
UAV or render target as a shader resource at the same time.
Fixes visual artifacts in Shining Resonance Refrain on AMD hardware.
Significantly improves performance in AC:Odyssey when CPU bound.
Only has an effect when no state changes between draw calls, and
when the draw parameter buffer is tightly packed.
Introduces an OpenGL-style bind point for the argument buffer, which
means we can avoid a lot of unnecessary reference tracking in games
that do a lot of indirect draw calls.
Reduces CPU overhead in Assassin's Creed Odyssey.
Reduces the number of dynamic memory allocations for CS chunks by
recycling them once they are no longer needed. Also fixes a potential
issue with chunks that are dispatched multiple times.