Some games do not compute the number of mip levels of
a texture or texture view correctly, so we should work
around this by capping it to the highest possible value.
If this method is used to clear a view with a floating point format,
we need to create a compatible view with an integer format in order
to clear the resource with the correct value. Fixes some calls to
this function in Rise of the Tomb Raider and other games.
No `std::mbstowcs` string conversion needed than.
W/o this change I have different output when running `dxgi-factory.exe` compiled with MinGW:
* `Output 0:`
```
\\.\DISPLAY1 (default)
\\.\DISPLAY1 (DXVK MinGW)
\ (DXVK winebuild)
```
With this patch all three variants are identical (`\\.\DISPLAY1`)
p.s. same problem in dxgi_adapter.cpp, but `deviceProp.deviceName` is vulkan structure parameter (char deviceName[VK_MAX_PHYSICAL_DEVICE_NAME_SIZE]).
This is part of a major refactoring process regarding DXGI->Vulkan
format conversions. Since we don't patch format lookup tables any
longer, we can create a global lookup table.
The D3D11 ClearUnorderedAccessView* and ClearView functions
will have to be emulated using compute shaders rather than
clear operations, since Vulkan clear operations do not take
image views and their format into account.
We don't need to iterate over the full shader code when creating
a new shader module. This optimization may slightly reduce the
initial pipeline creation time.
Drivers from both major vendors implement their own shader cache
already, and storing a cache per game causes more issues than it
solves. Should fix#261.
If an application compiles the same shader multiple times, we should reuse
an already existing DxvkShaderModule instead of creating a new one. This
helps keep the number of DxvkGraphicsPipeline objects low in games such
as Rise of the Tomb Raider.
* [dxgi] Fix compilation with WINE headers
```gcc
error: cannot convert 'MONITORINFOEX* {aka tagMONITORINFOEXA*}' to 'LPMONITORINFO {aka tagMONITORINFO*}' for argument '2' to 'BOOL GetMonitorInfoA(HMONITOR, LPMONITORINFO)'
```
```clang
cannot initialize a parameter of type 'LPMONITORINFO' (aka 'tagMONITORINFO *') with an rvalue of type '::MONITORINFOEX *' (aka 'tagMONITORINFOEXA *')
```
This can be WINE bug but I don't want to dig now, firs suggestion is wrong "tag":
wine variant
```c
typedef struct tagMONITORINFO
{
...
} MONITORINFO, *LPMONITORINFO;
typedef struct tagMONITORINFOEXA
{ /* the 4 first entries are the same as MONITORINFO */
...
} MONITORINFOEXA, *LPMONITORINFOEXA;
typedef struct tagMONITORINFOEXW
{ /* the 4 first entries are the same as MONITORINFO */
...
} MONITORINFOEXW, *LPMONITORINFOEXW;
DECL_WINELIB_TYPE_AW(MONITORINFOEX)
DECL_WINELIB_TYPE_AW(LPMONITORINFOEX)
```
VS
MinGW variant
```c
typedef struct tagMONITORINFO {
...
} MONITORINFO,*LPMONITORINFO;
typedef struct tagMONITORINFOEXA : public tagMONITORINFO {
CHAR szDevice[CCHDEVICENAME];
} MONITORINFOEXA,*LPMONITORINFOEXA;
typedef struct tagMONITORINFOEXW : public tagMONITORINFO {
WCHAR szDevice[CCHDEVICENAME];
} MONITORINFOEXW,*LPMONITORINFOEXW;
__MINGW_TYPEDEF_AW(MONITORINFOEX)
__MINGW_TYPEDEF_AW(LPMONITORINFOEX)
```
* [dxgi] Fix compilation with WINE headers
Use C++-style casts rather than C ones.
Since we are synchronizing once per frame anyway, there is no need to
artificially limit the number of chunks in flight. Applications which
use deferred contexts and submit a large number of CS chunks through
command lists may benefit from this optimization.
This optimization may help keep the GPU busy in case there's
a large number of draw calls pending at the time a command
list from a deferred context is submitted for execution.
HUD elements can be enabled individually using a comma-separated
list. Supported options include:
- fps: Displays the framerate
- devinfo: Displays device info
Passing "1" has the same effect as "fps,devinfo".
`limits.h` required for `UINT_MAX` and not always used, so better to use standard C++ variant from `<limits>`.
Some compilers may simply return `UINT_MAX` value, gcc version: `max() _GLIBCXX_USE_NOEXCEPT { return __INT_MAX__ * 2U + 1; }`.
* [dxgi] Replace MSVC _countof macro with std::size
pro: crosscompiler
con: may not work for older clang/gcc/...
http://en.cppreference.com/w/cpp/iterator/size
For example `Run this code` mode produces errors for GCC-5.2(C++17) and clang-3.8(C++17). Local GCC-7.3 and clang-7 versions are ok.
Not tested w/ MinGW64.
* [dxgi] Replace MSVC _countof macro with std::size
We cannot run these in parallel in case the hull shader's output vertex
count, and thus the invocation count, is less than the fork/join phase
invocation count.
May reduce execution time of hull shaders on the GPU by running
the fork/join phases in parallel, as originally intended. Tested
on RADV 18.0.99 with LLVM 6.0.0.
Since we create only one DxvkContext per D3D11Device, rather than
per D3D11DeviceContext as originally planned, there is no need to
keep the pipeline manager as a global thread-safe object. This may
slightly reduce CPU overhead.
* [dxgi] Implement freeing private data
Done by passing null as data.
Fixes wine private data test crash and passes them.
* [dxgi] Improve private data argument handling
Fixes 7 more wine tests.
Refactored DxgiVkDevice, D3D11Device and D3D11Presenter
to behave more like aggregable objects, where the new
D3D11DeviceContainer class is the COM aggregate object.
Fixes the reference counting issue outlined in #210.
Also refactored buffer mapping to reduce code duplication.
Optimized the lookup function for a small performance gain
in games which map a lot of resources on deferred contexts.
Apparently this breaks Elder Scrolls Online as well, so we'll
just enable it explicitly for games which benefit from this
optimization and disable it by default.
HLSL tbuffers are translated to resources with a "mixed" format.
There is no documentation about which format the buffers actually
use, so we'll default to UINT and see what happens.
Reduces command submission overhead by reusing fence objects
instead of creating new ones for each submission. Improves
error reporting in case the submission cannot be complete.
Workaround for a regression in The Witcher 3 that was introduced
in commit 53d557c2db. May have a
significant negative impact on performance in some games.
The old initialization code did not take either CSMT or
Deferred Contexts into account and could lead to illegal
calls to beginRecording.
Fixes a hang encountered in Dishonored 2.
Fixes a violation of the Vulkan specification where atomic operations
would be used on storage images with SpvImageFormatUnknown. Should fix
driver crashes on Nvidia.
TODO: Fix data types for atomic operation instructions.
* [util] Adds getTempDirectory() function
Will be used by on-disk pipeline caching
* [dxvk] Implement on-disk shader caching
Saving the pipeline cache to disk when the application exits
should be sufficient but the DxvkPipelineCache destructor isn't
reliably called on exit (ref-counting issue?).
As a workaround every frame we check and save the cache if the
size increased by some amount or after one minute elapsed.
* [dxvk] Periodically update shader cache file in separate thread
Reduces lock contention and slightly improves performance in games
that rely heavily on the buffer renaming mechanism if the lock
protecting the original free list was contested.
This is required for resource mapping on deferred contexts.
May also fix a potential synchronization issue where a buffer
could be mapped multiple times before the CS thread would mark
the physical buffer as used, which would result in invalid data.