Before, the interrupt would remain enabled during reception, which would
re-set the PCINT flag because of the level changes inside the received
byte. Because interrupts are globally disabled, this would not
immediately trigger an interrupt, but the flag would be remembered to
trigger another PCINT interrupt immediately after the first one is
processed.
Typically this was not a problem, because the second interrupt would see
the stop bit, or an idle line, and decide that the interrupt triggered
for someone else. However, at high baud rates, this could cause the
next interrupt for the real start bit to be delayed so much that the
byte got corrupted.
By clearing the interrupt mask bit for just the RX pin (as opposed to
the PCINT mask bit for the entire port), any PCINT events on other bits
can still set the PCINT flag and be processed as normal. In this case,
it's likely that there will be corruption, but that's inevitable when
(other) interrupts happen during SoftwareSerial reception.
This precalculates the mask register and value, making setRxIntMask
considerably less complicated. Right now, this is not a big deal, but
simplifying it allows using it inside the ISR next.
Since those functions are only called once now, it makes sense to inline
them. This saves a few bytes of program space, but also saves a few
cycles in the critical RX path.
Previously, up to four separate but identical ISR routines were defined,
for PCINT0, PCINT1, PCINT2 and PCINT3. Each of these would generate
their own function, with a lot of push-popping because another function
was called.
Now, the ISR_ALIASOF macro from avr-libc is used to declare just the
PCINT0 version and make all other ISRs point to that one, saving a lot
of program space, as well as some speed because of improved inlining.
On an Arduino Uno with gcc 4.3, this saves 168 bytes. With gcc 4.8, this
saves 150 bytes.
Similar to SoftwareSerial::write, this rewrites the loop to only touch
the MSB and then shift those bits up, allowing the compiler to generate
more efficient code. Unlike the write function however, it is not needed
to put all instance variables used into local variables, for some reason
the compiler already does this (and doing it manually even makes the
code bigger).
On the Arduino Uno using gcc 4.3 this saves 26 bytes. Using gcc 4.8 this
saves 30 bytes.
Note that this removes the else clause in the code, making the C code
unbalanced, which looks like it breaks timing balance. However, looking
at the code generated by the compiler, it turns out that the old code
was actually unbalanced, while the new code is properly balanced.
This change restructures the loop, to help the compiler generate shorter
code (because now only the LSB of the data byte is checked and
subsequent bytes are shifted down one by one, it can use th "skip if bit
set" instruction).
Furthermore, it puts most attributes in local variables, which causes
the compiler to put them into registers. This makes the timing-critical
part of the code smaller, making it easier to provide accurate timings.
On an Arduino uno using gcc 4.3, this saves 58 bytes. On gcc 4.8, this
saves 14 bytes.
Somehow gcc 4.8 doesn't inline this function, even though it is always
called with constant arguments and can be reduced to just a few
instructions when inlined. Adding the always_inline attribute makes gcc
inline it, saving 46 bytes on the Arduino uno.
gcc 4.3 already inlined this function, so there are no space
savings there.
Before, there was nearly identical code for the inverted and regular
cases. However, simply inverting the byte in the inverted case allows
using the regular code twice, reducing the generated code size by 100
bytes (on an Arduino Uno and gcc 4.3, on gcc 4.8 the reduction is 50
bytes).
stopListening also disabled the interrupt, if needed, so calling that
function makes more sense. Since stopListening only disables the
interrupt when the current SoftwareSerial is the active object, and that
can only be the case when _rx_delay_stopbit is non-zero, there is no
need to separately check _rx_delay_stopbit anymore.
If an interrupt causing overflow would occur between reading
_buffer_overflow and clearing it, this overflow condition would be
immediately cleared and never be returned by overflow().
By only clearing the overflow flag if an overflow actually occurred,
this problem goes away (worst case overflow() returns false even though
an overflow _just_ occurred, but then the next call to overflow() will
return true).
This prevents interrupts from triggering when the SoftwareSerial
instance is not even listening.
Additionally, this removes the need to disable interrupts in
SoftwareSerial::listen, since no interrupts are active while it touches
the variables.
The current check is still always false when the old check was, but
additionally it will not disable the interrupts when they were never
enabled (which shouldn't matter much, but this is more consistent).
In this case, SoftwareSerial::begin will not have enabled the
interrupts, so better not allow the SoftwareSerial instance to enter the
listening state either.
Before enabling interupts, begin would see if the given receive pin
actually has an associated PCINT register. If not, the interrupts would
not be enabled.
Now, the same check is done, but when no register is available, the rx
parameters are not loaded at all (which in turn prevents the interrupt
from being enabled). This allows all code to use the same "is rx
enabled" (which will be added next).
Previously, it could happen that SPI::beginTransaction was
interrupted by an ISR, while it is changing the SPI_AVR_EIMSK
register or interruptSave variable (it seems that there is
a small window after changing SPI_AVR_EIMSK where an interrupt
might still occur). If this happens, interruptSave is overwritten
with an invalid value, permanently disabling the pin interrupts.
To prevent this, disable interrupts globally while changing
these values.
From https://github.com/arduino/Arduino/pull/2376#issuecomment-59671152
Quoting Andrew Kroll:
[..this commit..] introduces a small delay that can prevent the wait
loop form iterating when running at the maximum speed. This gives
you a little more speed, even if it seems counter-intuitive. At
lower speeds, it is unnoticed. Watch the output on an oscilloscope
when running full SPI speed, and you should see closer back-to-back
writes.
Quoting Paul Stoffregen:
I did quite a bit of experimenting with the NOP addition. The one
that's in my copy gives about a 10% speedup on AVR.
Previously, when verbose uploads were enabled, avrdude was run with four
-v options, causing it to dump all raw bytes exchanged with the
bootloader. This floods the console so much that meaningful output
mostly disappears.
Most users probably want to enable verbose mode just to see what avrdude
command is ran. Furthermore, users that benefit from the raw bytes
dumped are perfectly capable of either running avrdude manually, or
modifying platform.txt. Given that, running avrdude with just one -v
should be plenty.
This fixes#891.