Addresses two potential issues:
- Our spinlocks are almost never contested, however the code generated
is not ideal without the likely/unlikely hints.
- In the unlike event that a spinlock is in fact contested, we'd yield
immediately, even though most of the time we'd only have to wait for
a few hundred cycles at most.
Replacing our spinlocks with std::mutex is not an option due to much
higher locking overhead in the uncontested case; doing so reduces
performance significantly for the buffer slice and pipeline locks.
Atomic fetch-and-add on unlock is not needed since no other thread can
modify the serving counter after the calling thread acquired the lock.
May slightly improve performance in games relying on ID3D10Multithread.
Wine needs to setup each thread that has an access to Windows APIs. It means that in winelib builds, we can't let standard C++ library create threads and need to use Wine for that instead. I wrote a thin wrapper around Windows thread functions so that the rest of code just has to use new dxvk::thread class instead of std::thread.