Some games use UpdateSubresource to upload constant buffers in
between draws, so this path should be as fast as possible.
Also fixes a potential issue when using D3D11_COPY_NO_OVERWRITE
on deferred contexts, since the Map requirements don't hold here.
Notably, fairly generic functions to create/launch/destroy Cuda kernels,
and methods to fetch GPU virtual addresses and handles for DX11 resources.
Note: Requires some corresponding dxvk-nvapi changes for DLSS to
be initialized successfully.
Aims to be mostly transparent to the application, although breakage
can still happen if shaders query the sample count and do not handle
a sample count of 1.
This causes some problems when the app uses a combination of index
buffer offset and StartIndexLocation that overflows 32-bit integers.
In my testing, there haven't been many games benefitting from this
optimization anyway, so just reverting it should not have tangible
effects on performance.
If the output array is non-null, these functons always return the
number of valid viewports or scissors actually written to the array.
Fixes a wine test failure.
Do not rebind the buffer if only the offset changes. Instead,
adjust StartIndexLocation in indexed draw calls. For indirect
draws, this will be disabled on the fly.
This may save a whole bunch of work in the backend, and reduces
the number of commands being sent to the CS thread in the first
place, which is why this optimization is not being done in the
backend itself but rather on the client API side.
Trine 4 uses a stride of 32 bytes. Detecting the stride dynamically
allows us to merge a couple of draws in this game, and others which
do not tightly pack their draw parameter buffers.
SetConstantBuffers will only bind the first 65536 bytes of any
buffer passed to it if it is larger. This can be seen even when
querying the bound range via GetConstantBuffers1.
SetConstantBuffers1 does not have any effect if the bound range
is invalid.
We should use clearRenderTarget whenever we clear the entire view.
The Talos Principle uses ClearView to clear its render targets for
some reason, and we were hitting a slow path there.