Commit Graph

41 Commits

Author SHA1 Message Date
Pierre Bourdon
0ff1481494 Optimize PPC CR emulation by using magic 64 bit values
PowerPC has a 32 bit CR register, which is used to store flags for results of
computations. Most instructions have an optional bit that tells the CPU whether
the flags should be updated. This 32 bit register actually contains 8 sets of 4
flags: Summary Overflow (SO), Equals (EQ), Greater Than (GT), Less Than (LT).
These 8 sets are usually called CR0-CR7 and accessed independently. In the most
common operations, the flags are computed from the result of the operation in
the following fashion:
  * EQ is set iff result == 0
  * LT is set iff result < 0
  * GT is set iff result > 0
  * (Dolphin does not emulate SO)

While X86 architectures have a similar concept of flags, it is very difficult
to access the FLAGS register directly to translate its value to an equivalent
PowerPC value. With the current Dolphin implementation, updating a PPC CR
register requires CPU branching, which has a few performance issues: it uses
space in the BTB, and in the worst case (!GT, !LT, EQ) requires 2 branches not
taken.

After some brainstorming on IRC about how this could be improved, calc84maniac
figured out a neat trick that makes common CR operations way more efficient to
JIT on 64 bit X86 architectures. It relies on emulating each CRn bitfield with
a 64 bit register internally, whose value is the result of the operation from
which flags are updated, sign extended to 64 bits. Then, checking if a CR bit
is set can be done in the following way:
  * EQ is set iff LOWER_32_BITS(cr_64b_val) == 0
  * GT is set iff (s64)cr_64b_val > 0
  * LT is set iff bit 62 of cr_64b_val is set

To take a few examples, if the result of an operation is:
  * -1 (0xFFFFFFFFFFFFFFFF) -> lower 32 bits not 0       => !EQ
                            -> (s64)val (-1) is not > 0  => !GT
                            -> bit 62 is set             =>  LT
            !EQ, !GT, LT

  *  0 (0x0000000000000000) -> lower 32 bits are 0       =>  EQ
                            -> (s64)val (0) is not > 0   => !GT
                            -> bit 62 is not set         => !LT
            EQ, !GT, !LT

  *  1 (0x0000000000000001) -> lower 32 bits not 0       => !EQ
                            -> (s64)val (1) is > 0       =>  GT
                            -> bit 62 is not set         => !LT
            !EQ, GT, !LT

Sometimes we need to convert PPC CR values to these 64 bit values. The
following convention is used in this case:
  * Bit 0 (LSB) is set iff !EQ
  * Bit 62 is set iff LT
  * Bit 63 is set iff !GT
  * Bit 32 always set to disambiguize between EQ and GT

Some more examples:
  * !EQ, GT, LT -> 0x4000000100000001 (!B63, B62, B32, B0)
                -> lower 32 bits not 0          => !EQ
                -> (s64)val is > 0              =>  GT
                -> bit 62 is set                =>  LT
  * EQ, GT, !LT -> 0x0000000100000000
                -> lower 32 bits are 0          =>  EQ
                -> (s64)val is > 0 (note: B32)  =>  GT
                -> bit 62 is not set            => !LT
2014-07-30 21:41:17 -07:00
Lioncash
b03c12764d Really get rid of the MSVC 2005 workaround completely 2014-07-29 21:20:43 -04:00
Lioncash
412196a055 Core: Remove defines used to work around an MSVC 2005 bug 2014-07-29 19:33:08 -04:00
degasus
6d3f249dcc mark all local variables as static 2014-07-11 16:10:20 +02:00
degasus
22e1aa5bb4 mark all local functions as static 2014-07-11 16:07:23 +02:00
Ryan Houdek
a40ae6883a Move CoreTiming::downcount to PowerPC::ppcState.
This isn't technically the correct place to have the downcount variable, but it is similar to what PPSSPP does to gain a bit of extra speed on ARM.
We access this variable quite a bit, with each exit in a block it is subtracted from.
On ARM this required four instructions to load and store the value, while now it only requires two.

This gives an average of 1FPS gain to most games.
Examples:
Crazy Taxi: 54FPS -> 55FPS
Luigi's Mansion: 20FPS -> 21FPS
Wind Waker(Save Screen): 27FPS -> 28FPS

This seems to average a 6mhz to 16mhz CPU core emulation improvement in the few games I've tested.
2014-06-26 01:48:00 +00:00
Tony Wasserka
9f22b2378d Merge pull request #485 from magumagu/packed-fp-reciprocal
Interpreter: return single-precision results for ps_rsqrte.
2014-06-19 16:51:33 +02:00
Lioncash
ce54c1e571 Kill off replaceable usages of s[n]printf. 2014-06-18 19:53:38 -04:00
magumagu
3da52018dc Interpreter: return single-precision results for ps_rsqrte. 2014-06-11 19:50:33 -07:00
Paul Olszewski
5d793881b0 Fix the capitalization of "GameCube" throughout the project. 2014-06-08 11:24:49 +09:00
magumagu
98dd99a696 Interpreter: correctly support HLE functions.
m_EndBlock is always false at the beginning of SingleStepInner in the
normal interpreter loop.
2014-05-25 15:39:46 -07:00
magumagu
440246a190 Interpreter: use numeric_limits instead of FLT_MAX.
Minor cleanup, and fixes compilation on some systems.
2014-05-24 10:58:15 +02:00
magumagu
6955fef161 Interpreter: Code style fixes. 2014-05-23 15:06:09 -07:00
magumagu
d0ed3b8192 Jit: Use infinity and NaN from numeric_limits.
MSVC's implementation of INFINITY is unusable.
2014-05-23 14:59:03 -07:00
magumagu
a9a2d3d98d New frsqrte implementation; verified accurate.
This is similar to the old implementation, but it uses smaller tables, and
handles more edge cases correctly.  (hwtest coming soon.)
2014-05-23 14:59:02 -07:00
magumagu
129e76e60d Interpreter: refactor the rsqrte code, and use it for ps_rsqrte. 2014-05-23 14:59:00 -07:00
magumagu
2f8a147eda Interpreter: make fres match hardware.
New table-based implementation written based on actual hardware behavior.
(hwtest coming soon).
2014-05-22 19:48:48 -07:00
magumagu
ad4ad7c1ed Use accurate frsqrte in Interpreter.
The implementation of frsqrte exposed by this change isn't completely
correct; that will be fixed in a later commit.
2014-05-22 19:46:27 -07:00
booto
9892c8ea54 numCyclesMinusOne to numCycles in GekkoOPInfo 2014-04-30 19:04:02 +08:00
Ryan Houdek
94497961ac Removes unused argument in Helper_UpdateCR1.
Interpreter::Helper_UpdateCR1 doesn't use the argument passed to UpdateCR1. It pulls its value from the FPSCR register.
Also there was a Interpreter::Helper_UpdateCR1(float) in addition to Helper_UpdateCR1(double) that hasn't ever existed. Remove the function
declaration.
2014-04-24 22:00:58 -05:00
magumagu
002fb0b563 Interpreter: don't PanicAlert on write to SPR_HID2.
The alert apparently triggers on Midway Arcade Treasures 2; given that the
game otherwise works fine, it's not a high priority to accurately emulate
the bit in question.

Fixes issue 7197.
2014-04-18 20:20:42 -07:00
Tillmann Karras
2fcaca0603 More range-based loops and overrides 2014-03-17 02:55:55 +01:00
Tillmann Karras
3c46c0ede9 Interpreter: make some class members private 2014-03-17 02:55:54 +01:00
Matthew Parlane
31cfc73a09 Fixes spacing for "for", "while", "switch" and "if"
Also moved && and || to ends of lines instead of start.
Fixed misc vertical alignments and some { needed newlining.
2014-03-11 00:35:07 +13:00
Tillmann Karras
d802d39281 clang-modernize -use-nullptr
and s/\bNULL\b/nullptr/g for *.cpp/h/mm files not compiled on my machine
2014-03-09 21:14:26 +01:00
Tillmann Karras
16885d0f74 Interpreter: less duplicate code in float compares 2014-03-09 19:35:13 +01:00
Tillmann Karras
9ef64245fa MathUtil: fix IsQNAN()
The constants were one nibble too short and the lower 51 bits don't
actually have to be zero.
2014-03-09 19:34:58 +01:00
Tillmann Karras
d05e205a24 FPURoundMode: revert use of enums in bit-fields
The workaround of using fixed underlying types produces lots of warnings
in GCC because now the bit-fields are too small for the value range used
for conversion semantics.
2014-03-09 15:24:35 +01:00
Ryan Houdek
4f02132f93 Make our architecture defines less stupid.
Our defines were never clear between what meant 64bit or x86_64
This makes a clear cut between bitness and architecture.
This commit also has the side effect of bringing up aarch64 compiling support.
2014-03-04 09:36:59 -06:00
Lioncash
13a007abed Remove another clamp function laying in the codebase and replace it with the one in MathUtil.h. 2014-03-02 13:57:27 -05:00
Pierre Bourdon
311caef094 Merge pull request #25 from Tilka/ppc_fp
Fix non-IEEE mode
2014-02-23 04:15:37 +01:00
Tillmann Karras
ee21cbe2d1 Add phire's more accurate DoubleToSingle version
This method doesn't involve messing around with the quirks of the x87
FPU and should be reasonably fast. As a bonus, it does the correct thing
for out-of-range doubles.

However, it is also a little slower and only benefits programs that rely
on undefined behavior so it is disabled for now.
2014-02-23 04:13:47 +01:00
Lioncash
146b301a91 Fix more header sorting issues in Core/ (now check-includes clean). 2014-02-20 01:01:11 +01:00
Lioncash
2afe215271 Convert all includes to relative paths. 2014-02-18 02:19:10 -05:00
Lioncash
3fd87a7636 Second and final pass of clearing out tabs. 2014-02-17 02:19:41 -05:00
Tillmann Karras
404624bf0b Turn loops into range-based form
and some things suggested by cppcheck and compiler warnings.
2014-02-13 09:05:50 +01:00
Scott Mansell
7062cf8657 Interpeter: Fixed ConvertToDouble to match the manual.
Also added some documntation comments.
2014-02-12 23:12:17 +01:00
Tillmann Karras
f6897039c7 Interpreter: fix float conversions
Can't use simple casting, otherwise we get the same problems as in Jit64.
2014-02-12 23:12:15 +01:00
lioncash
d2038049f5 Replace all include guard ifdefs with "#pragma once" 2014-02-10 18:07:16 -05:00
Lioncash
ebb48d019e Clean up some struct indentations
Also cleaned up the indentations of some variable declarations.
2014-02-09 19:40:11 -05:00
Jasper St. Pierre
34692ab826 Remove unnecessary Src/ folders 2013-12-31 14:03:19 -05:00