Clip and cull distances can be defined as multi-component
vectors in D3D11. We still need to figure out how to map
them to the actuall cull distance array.
Input variables are now copied into a temporary array, which allows
dynamic indexing and which also allows us to use system values that
are mapped to input registers in DXBC. This breaks geometry shaders
for now, however.
While these are not being used as of yet, these classes can be
used to implement command stream multithreading in the future.
They are also useful to implement command lists for deferred
contexts, which are a core feature of D3D11.
Fixes a few bottlenecks that were encountered in the Cascading Shadow
Maps demo from the Microsoft SDK. Performance is now slightly better
than wined3d with CSMT, MESA_NO_ERROR and mesa_glthread enabled.
This is required because in D3D11, typeless formats can be used
to create both depth and stencil images, and color formats can
be used to view depth images. In Vulkan, images and views that
are used as depth-stencil attachments will have to be created
with a depth-stencil format, so we have to take the image's
bind flags into account when picking a format.
When invalidating a constant buffer, the descriptor was not
updated, which usually led to the wrong resource being used
and could also cause crashes.
This fix also includes resource tracking for shader resources
on the graphics pipeline. The code needs to be made compatible
with the compute pipeline as well.
Major rewrite of the entire shader decoder to generate easy
to parse data structures for the compiler, which ultimately
allows new instructions to be implemented more easily.
Command submission now does not synchronize with the device every single
time. Instead, the command list and the fence that was created for it are
added to a queue. A separate thread will then wait for the execution to
complete and return the command list to the device.
The naive optimization to use staging buffers rather than actual mapping
turned out to be no more efficient than the previous approach. In order
to achieve good performance, buffer renaming must be implemented instead.