The Backpatching routines didn't correctly understand where to find the real VFP register from, so in most cases it was using D0.
Fixes bugs in the slowmem loadstore routines as well.
Instead of doing vector operations and throwing away the top 64bits of each operation, let's instead use scalar operations.
On Cortex-A57 this saves us three cycles per vector operation changed to scalar, so this saves 3-9cycles per instruction emulated.
Also puts one less micro-op in to the vector pipeline there.
On the Nvidia Denver I couldn't see any noticeable performance difference, but it's a quirky architecture so it may be noticing we are throwing away
the top bits anyway and optimizing it. The world may never know what's truly happening there.
For offsets that fit in the instruction encoding then we should just put it in the instruction encoding.
Saves an instruction in a large amount of loadstores.
Revision b058bbd was causing the AsmCommon routines to overrun the code
buffer allocated for it. According to Fiora, it happens only on Linux
because of the fact that Linux has more caller-save registers than other
platforms.
The key change is that for stores less than 5 bytes, the correct
place for the trampoline to return is immediately after the backpatched jump,
not somewhere inside it.
Now it should be easier to merge more than 2-instruction-long sequences.
Also correct some minor inconsistencies in behavior between instruction
merging cases.
Optimistically assume used GQRs are 0 in blocks that only use one GQR, and
bail at the start of the block and recompile if that assumption fails.
Many games use almost entirely unquantized stores (e.g. Rebel Strike, Sonic
Colors), so this will likely be a big performance improvement across the board
for games with heavy use of paired singles.
Won't work with all games, but provides a nice way to spend extra CPU to make
a variable framerate game faster (e.g. Spyro or The Last Story), or to make
a game use less CPU at the cost of a lower framerate (e.g. Rogue Leader).
Retry a failed connection after a short delay -- hardware sometimes needs some
time to settle, or other Bluetooth programs are attempting to query the
device as well (e.g. blueman-manager).
An uninitialized struct member "l2_bdaddr_type" was making most connect calls
fail with "Invalid argument". The connection could succeed if the unitialized
memory happened to have a zero byte in the appropriate location.