From https://github.com/arduino/Arduino/pull/2376#issuecomment-59671152
Quoting Andrew Kroll:
[..this commit..] introduces a small delay that can prevent the wait
loop form iterating when running at the maximum speed. This gives
you a little more speed, even if it seems counter-intuitive. At
lower speeds, it is unnoticed. Watch the output on an oscilloscope
when running full SPI speed, and you should see closer back-to-back
writes.
Quoting Paul Stoffregen:
I did quite a bit of experimenting with the NOP addition. The one
that's in my copy gives about a 10% speedup on AVR.