Rebased the bugfix from the original Google Code issue #292 to work with Arduino 1.6.x
Description of original fix provided by Pete62:
The later 8 bit AVR's use two registers (TCCRxA, TCCRxB) whereas the ATmega8 only uses a single register (TCCR2) to house the control bits for Timer 2. Bits were inadvertently being cleared.
To avoid having a .cpp just for an extern variable definition, `static`
has been chosen over `extern`.
As the `EEPROMClass` class simply wraps functionality located elsewhere,
it is completely compiled away. Even though each translation unit which
includes the header will get a copy with internal linkage, there is no
associated overhead.
More info
[here](http://stackoverflow.com/questions/29098518/extern-variable-only-in-header-unexpectedly-working-why)
Previously, the TX pin would be set to output first and then written
high (assuming non-inverted logic). When the pin was previously
configured for input without pullup (which is normal reset state), this
results in driving the pin low for a short when initializing. This could
accidenttally be seen as a stop bit by the receiving side.
By first writing HIGH and then setting the mode to OUTPUT, the pin will
have its pullup enabled for a short while, which is harmless.
Instead of using a lookup table with (wrong) timings, this calculates
the timings in SoftwareSerial::begin. This is probably a bit slower, but
since it typically happens once, this shouldn't be a problem.
Additionally, since the lookup tables can be removed, this is also a lot
smaller, as well as supporting arbitrary CPU speeds and baudrates,
instead of the limited set that was defined before.
Furthermore, this switches to use the _delay_loop_2 function from
avr-libc instead of a handcoded delay function. The avr-libc function
only takes two instructions, as opposed to four instructions for the old
one. The compiler also inlines the avr-libc function, which makes the
timings more reliable.
The calculated timings directly rely on the instructions generated by
the compiler, since a significant amount of time is spent processing
(compared to the delays, especially at higher speeds). This means that
if the code is changed, or a different compiler is used, the
calculations might need changing (though a few cycles more or less
shouldn't cause immediate breakage).
The timings in the code have been calculated from the assembly generated
by gcc 4.8.2 and gcc 4.3.2.
The RX baudrates supported by SoftwareSerial are still not unlimited. At
16Mhz, using gcc 4.8.2, everything up to 115200 works. At 8Mhz, it works
up to 57600. Using gcc 4.3.2, it also works up to 57600 at 16Mhz and up
to 38400 at 8Mhz. Note that at these highest speeds, communication
works, but is still quite sensitive to other interrupts (like the
millis() interrupts) when bytes are sent back-to-back, so there still
are corrupted bytes in RX.
TX works up to 115200 for all combinations of compiler and clock rates.
This fixes#2019
Before, the interrupt would remain enabled during reception, which would
re-set the PCINT flag because of the level changes inside the received
byte. Because interrupts are globally disabled, this would not
immediately trigger an interrupt, but the flag would be remembered to
trigger another PCINT interrupt immediately after the first one is
processed.
Typically this was not a problem, because the second interrupt would see
the stop bit, or an idle line, and decide that the interrupt triggered
for someone else. However, at high baud rates, this could cause the
next interrupt for the real start bit to be delayed so much that the
byte got corrupted.
By clearing the interrupt mask bit for just the RX pin (as opposed to
the PCINT mask bit for the entire port), any PCINT events on other bits
can still set the PCINT flag and be processed as normal. In this case,
it's likely that there will be corruption, but that's inevitable when
(other) interrupts happen during SoftwareSerial reception.
This precalculates the mask register and value, making setRxIntMask
considerably less complicated. Right now, this is not a big deal, but
simplifying it allows using it inside the ISR next.
Since those functions are only called once now, it makes sense to inline
them. This saves a few bytes of program space, but also saves a few
cycles in the critical RX path.