* [d3d11] add some resource validation for CopyResource and CopyResourceSubregion
combine if statement
* [d3d11] added copy extents validation for compressed formats
* correct return values
* fix incorrect logic operators
* set valid copy extents when possible
* [d3d11] Clamp copy region in CopySubresourceRegion
* [dxvk] Add helper methods to deal with block-compressed images
* [d3d11] Clean up validation in CopySubresourceRegion
* [d3d11] Improve error reporting and validation in CopyResource
* [d3d11] Fix inconsistent error messages
* Fix narrowing warnings in spirv_module relating to enum's default width on x64
* Make hashes of states use correct types without casting.
* Fix narrowing conversion in d3d11_sampler.cpp
Fixes invalid shaders being generated in the Blacksmith demo on
some GPUs. Works around a possible issue in the output signature
reader.
Commit #1000, yay.
Fixes a GPU hang when closing Dark Souls 3 as well as similar
undesired behaviour in other games that continue to use the
DXGI swap chain after the window has been destroyed.
We do not have to do this anymore since we'll bind a large
enough dummy buffer. Considerably reduces code size in shaders
which access a large number of shader constants.
We will use this extension in order to implement vertex
binding divisors other than 1 for per-instance attributes.
Will be *required* as soon as support by wine and Vulkan
drivers is widely available.
We no longer use the 'oldSwapchain' member in the swap chain description
to replace an existing swap chain, but rather destroy it altogether and
create a new swap chain. While this is less than optimal, it might help
solve some swap chain-related issues such as #277.
An alternative to manually creating a framebuffer object and binding
it via bindFramebuffer. Future optmizations can use this to bring
down the number of redundant render pass changes.
- Do not report MIP_AUTOGEN if the image format cannot
be used as a color attachment
- Do not report SAMPLE_COMPARISON and GATHER_COMPARISON
if the DXGI format has no corresponding depth format
- Only report image-related features if the image format
can actually be used as a sampled image
Some games do not compute the number of mip levels of
a texture or texture view correctly, so we should work
around this by capping it to the highest possible value.
If this method is used to clear a view with a floating point format,
we need to create a compatible view with an integer format in order
to clear the resource with the correct value. Fixes some calls to
this function in Rise of the Tomb Raider and other games.
No `std::mbstowcs` string conversion needed than.
W/o this change I have different output when running `dxgi-factory.exe` compiled with MinGW:
* `Output 0:`
```
\\.\DISPLAY1 (default)
\\.\DISPLAY1 (DXVK MinGW)
\ (DXVK winebuild)
```
With this patch all three variants are identical (`\\.\DISPLAY1`)
p.s. same problem in dxgi_adapter.cpp, but `deviceProp.deviceName` is vulkan structure parameter (char deviceName[VK_MAX_PHYSICAL_DEVICE_NAME_SIZE]).
This is part of a major refactoring process regarding DXGI->Vulkan
format conversions. Since we don't patch format lookup tables any
longer, we can create a global lookup table.
The D3D11 ClearUnorderedAccessView* and ClearView functions
will have to be emulated using compute shaders rather than
clear operations, since Vulkan clear operations do not take
image views and their format into account.
We don't need to iterate over the full shader code when creating
a new shader module. This optimization may slightly reduce the
initial pipeline creation time.
Drivers from both major vendors implement their own shader cache
already, and storing a cache per game causes more issues than it
solves. Should fix#261.
If an application compiles the same shader multiple times, we should reuse
an already existing DxvkShaderModule instead of creating a new one. This
helps keep the number of DxvkGraphicsPipeline objects low in games such
as Rise of the Tomb Raider.
* [dxgi] Fix compilation with WINE headers
```gcc
error: cannot convert 'MONITORINFOEX* {aka tagMONITORINFOEXA*}' to 'LPMONITORINFO {aka tagMONITORINFO*}' for argument '2' to 'BOOL GetMonitorInfoA(HMONITOR, LPMONITORINFO)'
```
```clang
cannot initialize a parameter of type 'LPMONITORINFO' (aka 'tagMONITORINFO *') with an rvalue of type '::MONITORINFOEX *' (aka 'tagMONITORINFOEXA *')
```
This can be WINE bug but I don't want to dig now, firs suggestion is wrong "tag":
wine variant
```c
typedef struct tagMONITORINFO
{
...
} MONITORINFO, *LPMONITORINFO;
typedef struct tagMONITORINFOEXA
{ /* the 4 first entries are the same as MONITORINFO */
...
} MONITORINFOEXA, *LPMONITORINFOEXA;
typedef struct tagMONITORINFOEXW
{ /* the 4 first entries are the same as MONITORINFO */
...
} MONITORINFOEXW, *LPMONITORINFOEXW;
DECL_WINELIB_TYPE_AW(MONITORINFOEX)
DECL_WINELIB_TYPE_AW(LPMONITORINFOEX)
```
VS
MinGW variant
```c
typedef struct tagMONITORINFO {
...
} MONITORINFO,*LPMONITORINFO;
typedef struct tagMONITORINFOEXA : public tagMONITORINFO {
CHAR szDevice[CCHDEVICENAME];
} MONITORINFOEXA,*LPMONITORINFOEXA;
typedef struct tagMONITORINFOEXW : public tagMONITORINFO {
WCHAR szDevice[CCHDEVICENAME];
} MONITORINFOEXW,*LPMONITORINFOEXW;
__MINGW_TYPEDEF_AW(MONITORINFOEX)
__MINGW_TYPEDEF_AW(LPMONITORINFOEX)
```
* [dxgi] Fix compilation with WINE headers
Use C++-style casts rather than C ones.
Since we are synchronizing once per frame anyway, there is no need to
artificially limit the number of chunks in flight. Applications which
use deferred contexts and submit a large number of CS chunks through
command lists may benefit from this optimization.
This optimization may help keep the GPU busy in case there's
a large number of draw calls pending at the time a command
list from a deferred context is submitted for execution.
HUD elements can be enabled individually using a comma-separated
list. Supported options include:
- fps: Displays the framerate
- devinfo: Displays device info
Passing "1" has the same effect as "fps,devinfo".
`limits.h` required for `UINT_MAX` and not always used, so better to use standard C++ variant from `<limits>`.
Some compilers may simply return `UINT_MAX` value, gcc version: `max() _GLIBCXX_USE_NOEXCEPT { return __INT_MAX__ * 2U + 1; }`.
* [dxgi] Replace MSVC _countof macro with std::size
pro: crosscompiler
con: may not work for older clang/gcc/...
http://en.cppreference.com/w/cpp/iterator/size
For example `Run this code` mode produces errors for GCC-5.2(C++17) and clang-3.8(C++17). Local GCC-7.3 and clang-7 versions are ok.
Not tested w/ MinGW64.
* [dxgi] Replace MSVC _countof macro with std::size
We cannot run these in parallel in case the hull shader's output vertex
count, and thus the invocation count, is less than the fork/join phase
invocation count.
May reduce execution time of hull shaders on the GPU by running
the fork/join phases in parallel, as originally intended. Tested
on RADV 18.0.99 with LLVM 6.0.0.
Since we create only one DxvkContext per D3D11Device, rather than
per D3D11DeviceContext as originally planned, there is no need to
keep the pipeline manager as a global thread-safe object. This may
slightly reduce CPU overhead.
* [dxgi] Implement freeing private data
Done by passing null as data.
Fixes wine private data test crash and passes them.
* [dxgi] Improve private data argument handling
Fixes 7 more wine tests.
Refactored DxgiVkDevice, D3D11Device and D3D11Presenter
to behave more like aggregable objects, where the new
D3D11DeviceContainer class is the COM aggregate object.
Fixes the reference counting issue outlined in #210.
Also refactored buffer mapping to reduce code duplication.
Optimized the lookup function for a small performance gain
in games which map a lot of resources on deferred contexts.
Apparently this breaks Elder Scrolls Online as well, so we'll
just enable it explicitly for games which benefit from this
optimization and disable it by default.
HLSL tbuffers are translated to resources with a "mixed" format.
There is no documentation about which format the buffers actually
use, so we'll default to UINT and see what happens.
Reduces command submission overhead by reusing fence objects
instead of creating new ones for each submission. Improves
error reporting in case the submission cannot be complete.
Workaround for a regression in The Witcher 3 that was introduced
in commit 53d557c2db. May have a
significant negative impact on performance in some games.
The old initialization code did not take either CSMT or
Deferred Contexts into account and could lead to illegal
calls to beginRecording.
Fixes a hang encountered in Dishonored 2.
Fixes a violation of the Vulkan specification where atomic operations
would be used on storage images with SpvImageFormatUnknown. Should fix
driver crashes on Nvidia.
TODO: Fix data types for atomic operation instructions.
* [util] Adds getTempDirectory() function
Will be used by on-disk pipeline caching
* [dxvk] Implement on-disk shader caching
Saving the pipeline cache to disk when the application exits
should be sufficient but the DxvkPipelineCache destructor isn't
reliably called on exit (ref-counting issue?).
As a workaround every frame we check and save the cache if the
size increased by some amount or after one minute elapsed.
* [dxvk] Periodically update shader cache file in separate thread
Reduces lock contention and slightly improves performance in games
that rely heavily on the buffer renaming mechanism if the lock
protecting the original free list was contested.
This is required for resource mapping on deferred contexts.
May also fix a potential synchronization issue where a buffer
could be mapped multiple times before the CS thread would mark
the physical buffer as used, which would result in invalid data.
Vulkan does not support buffer RTVs, and neither does DXVK, so we
should return an error in that case. Previously, DXVK would crash
upon querying image information.
The back buffer needs to be deleted explicitly because on
the way it is created. Fixes reference counting issues in
applications which resize the back buffer at least once.
Reduces the CPU overhead of descriptor set updates, which usually
happen once per draw call. Gains seem to be minor in most games,
some outliers show significantly better performance (i.e. Tomb Raider).
Closer to the D3D11 API. We cannot use the normal clearColorImage and
clearDepthStencilImage methods in case the game uses a 2D array view
for a 3D image. Fixes some validation issues in Hellblade.
This may prevent driver crashes and give more useful debugging info
in case a given combination of image parameters is not supported by
a device. May also improve compatibility with direct image mapping.
SPIR-V tools did not turn out to be useful, but increased the
binary size by a significant amount and caused build problems.
- spirv-opt: Far too slow for the intended purpose, and Nvidia
specific shader issues have been reported and fixed.
- spirv-val: Not much value in practice since shaders can be
written to a directory and validated manually.