This causes some problems when the app uses a combination of index
buffer offset and StartIndexLocation that overflows 32-bit integers.
In my testing, there haven't been many games benefitting from this
optimization anyway, so just reverting it should not have tangible
effects on performance.
If the output array is non-null, these functons always return the
number of valid viewports or scissors actually written to the array.
Fixes a wine test failure.
Do not rebind the buffer if only the offset changes. Instead,
adjust StartIndexLocation in indexed draw calls. For indirect
draws, this will be disabled on the fly.
This may save a whole bunch of work in the backend, and reduces
the number of commands being sent to the CS thread in the first
place, which is why this optimization is not being done in the
backend itself but rather on the client API side.
Trine 4 uses a stride of 32 bytes. Detecting the stride dynamically
allows us to merge a couple of draws in this game, and others which
do not tightly pack their draw parameter buffers.
SetConstantBuffers will only bind the first 65536 bytes of any
buffer passed to it if it is larger. This can be seen even when
querying the bound range via GetConstantBuffers1.
SetConstantBuffers1 does not have any effect if the bound range
is invalid.
We should use clearRenderTarget whenever we clear the entire view.
The Talos Principle uses ClearView to clear its render targets for
some reason, and we were hitting a slow path there.
Rocket League tries to copy five subresources of a texture that only
has one single array layer and one single mip map, which causes GPU
hangs on Nvidia drivers.
Various truck simulations are broken and set this on every
CopySubresourceRegion call, which, if we were to implement
DiscardBuffer for non-mappable resources again, would break
them. This flag seemingly has no effect on native D3D11.
ClearState gets used a lot in games that use deferred
contexts, so we should make sure it's fast. Since we
apply default state everywhere, there is no need to
perform any expensive RestoreState operations.
Reduces the amount of time spent on ClearState on the
CS thread by ~40%, and by ~90% on the calling thread.
Luckily, all known cases where games use UNDEFINED topology fail
validation elsewhere due to missing vertex shaders, but we should
handle this correctly anyway.