Commit Graph

97 Commits

Author SHA1 Message Date
Fiora
8237004448 JIT: optimize for the common case of unquantized psq_l/st
Optimistically assume used GQRs are 0 in blocks that only use one GQR, and
bail at the start of the block and recompile if that assumption fails.

Many games use almost entirely unquantized stores (e.g. Rebel Strike, Sonic
Colors), so this will likely be a big performance improvement across the board
for games with heavy use of paired singles.
2015-01-10 14:14:43 -08:00
Fiora
55b509b739 Interpreter: fix slwx 2015-01-09 11:39:41 -08:00
Ryan Houdek
08660c89ad Fix register usage detection in PPCAnalyst.
lmw/stmw weren't properly setting input and output registers since they use multiple registers.
dcbz was just missing a flag in the instruction tables.
2014-12-02 16:12:33 -06:00
Fiora
4e0591cdf1 JIT: float instruction attribute fixes, fix binding mistakes
These instructions modify only the bottom halves of the output register,
so the output register needs to be treated as an input too.
2014-11-29 11:30:52 -08:00
Stevoisiak
b25e1a2eb4 Various formatting and consistency fixes 2014-11-13 22:42:18 -05:00
Lioncash
a105a9a557 Interpreter: Remove dead patches() function 2014-11-04 20:44:57 -05:00
Ryan Houdek
204598a082 Merge pull request #1350 from FioraAeterna/integeropts
Various smallish JIT optimizations
2014-11-02 20:13:20 -06:00
Lioncash
475bb40364 Interpreter: Remove a redundant macro 2014-10-31 10:55:25 -04:00
Fiora
fb0960f0ee JIT: flush unused registers during branch merges
Also correct some flags in interpreter tables.
2014-10-29 00:32:59 -07:00
skidau
96a2b74c02 Merge pull request #846 from lioncash/fpscr-enum
Core: Move FPSCR exception flags to a typed enum
2014-10-14 12:33:57 +11:00
Fiora
75a1310ba9 CPU: improve instruction table flags for RC bits 2014-10-08 11:44:37 -07:00
Fiora
85547d94be JIT: properly remove FIFO write addresses when code is invalidated
Fixes a bug caused by interaction with carry optimizations; might fix other
issues too.
2014-09-30 01:00:23 -07:00
Lioncash
843a3f6c15 Core: Move FPSCR exception flags to a typed enum 2014-09-29 00:46:15 -04:00
skidau
afccf2276d Merge pull request #1012 from skidau/aram-dma-exceptions
Compile the ARAM DMA exception checks into the JIT block
2014-09-28 14:48:38 +10:00
Fiora
3878187721 Interpreter: remove debug printf in psq_l 2014-09-27 20:44:45 -07:00
skidau
7184019090 Increased the savestate internal version.
Added a small note for instant dma.
2014-09-28 11:51:14 +10:00
skidau
86b6dfe4b3 Added a instant ARAM DMA mode which is enabled automatically when required.
Detects a situation where the game is writing to the dcache at the address being DMA'd. As we do not have dcache emulation, invalid data is being DMA'd causing audio glitches. The following code detects this and enables the DMA to complete instantly before the invalid data is written.
Added accurate ARAM DMA transfer timing.
Removed the addition of DSP exception checking.
2014-09-27 20:47:29 +10:00
skidau
945d431171 Added OPTYPE_LOADPS and OPTYPE_STOREPS instruction types to the PPC table.
Updated ARAM DMA and FIFO write exception checking to uses these types.

Conflicts:
	Source/Core/Core/PowerPC/Interpreter/Interpreter_Tables.cpp
	Source/Core/Core/PowerPC/PPCTables.h
2014-09-27 20:16:26 +10:00
skidau
30d77b38c5 Merge pull request #1127 from Sonicadvance1/QGR-BitField
Change the QGR union over to a BitField union.
2014-09-26 14:53:24 +10:00
Fiora
5fce109ce1 Reorganize carry to store flags separately instead of part of XER
Also correct behavior with regards to which bits in XER are treated as zero
based on a hwtest (probably doesn't affect any real games, but might as well
be correct).
2014-09-24 12:27:47 -07:00
Fiora
f103234e2b JIT: flush a register if it won't be used for the rest of the block
This should dramatically reduce code size in the case of blocks with
lots of branches, and certainly doesn't hurt elsewhere either.

This can probably be improved a good bit through smarter tracking of register
usage, e.g. discarding registers that are going to be overwritten, but this
is a good start and should help reduce code size and register pressure.
Unlike that sort of change, this is a "safe" patch; it only flushes registers,
which can't affect correctness, unlike actually discarding data.

As part of this, refactor PPCAnalyst to support distinguishing between
float and integer registers (to properly handle instructions that access
both, like floating-point loads and stores).

Also update every instruction in the interpreter flags table I could find
that didn't have all the correct flags.
2014-09-22 16:00:25 -07:00
Ryan Houdek
9d7598266f Change the QGR union over to a BitField union.
Makes it easier to generate a QGR in my unit test, cleaner overall of course.
2014-09-20 13:15:44 -05:00
Ryan Houdek
0294b344e2 Merge pull request #1086 from FioraAeterna/fixsrawint
Interpreter: fix carry calculation in srawx
2014-09-18 06:41:37 -05:00
Fiora
9b8cfcdc29 Interpreter: fix carry calculation in srawx
I don't know anything this affected, but it didn't match the manual (or JIT).
2014-09-14 15:08:57 -07:00
Fiora
54129a8ca5 PPCAnalyst: refactor, add carry op reordering and non-cmp reordering
Tries as hard as possible to push carry-using operations (like addc and adde)
next to each other. Refactor the instruction reordering to be more flexible
and allow multiple passes.

353 -> 192 x86 instructions on a carry-heavy code block in Pokemon Puzzle.
12% faster overall in Pokemon Puzzle; probably less in typical games (Virtual
Console games seem to be carry-heavy for some reason; maybe a different
compiler?)
2014-09-13 13:48:23 -07:00
Fiora
bea2504a51 JIT64: optimize carry calculations
Omit carry calculations that get overwritten later in the block before they're
used. Very common in the case of srawix and friends.
2014-09-13 13:47:43 -07:00
Ryan Houdek
71cb09f1ca Merge pull request #1027 from rohit-n/change-include
Include CommonTypes.h instead of Common.h.
2014-09-10 00:35:16 -05:00
skidau
d1439bc1db Merge pull request #1041 from RachelBryk/kill-g_CoreStartupParameter
Kill Core::g_CoreStartupParameter.
2014-09-10 11:00:42 +10:00
Rachel Bryk
f93aa7087c Kill Core::g_CoreStartupParameter. 2014-09-09 00:24:49 -04:00
Rohit Nirmal
fbc64984ca Include CommonTypes.h instead of Common.h. 2014-09-08 15:39:58 -04:00
Zhuowei Zhang
e63f7c01a3 Fix twi/tw instructions being switched in Jit64 and JitArm; downgrade the ERROR_LOG printed when tw is ran in the interpreter to DEBUG 2014-09-07 13:35:18 -04:00
Zhuowei Zhang
2ac2cbbcf6 Downgrade the ERROR log printed when twi is executed in interpreter to DEBUG 2014-09-06 22:36:17 -04:00
Scott Mansell
50657548b1 Make Invalid instruction debug assert a non-debug assert.
Users need to be able to see this error message. Otherwise they can't
report bugs.
2014-09-06 19:04:34 +12:00
Fiora
07e0c917c6 Revert "JIT64: optimize CA calculations" 2014-09-05 10:26:30 -07:00
comex
97420c6ec6 Merge pull request #852 from FioraAeterna/optimizeca
JIT64: optimize CA calculations
2014-09-05 11:52:02 -04:00
comex
aa1df21bb6 Merge pull request #947 from FioraAeterna/rsqrte
JIT: implement frsqte
2014-09-05 11:48:00 -04:00
Jasper St. Pierre
a5297f6da8 PixelEngine: Remove unused AllowIdleSkipping and all references to it 2014-09-04 17:25:59 -07:00
Fiora
1b50f9df14 JIT: implement fres
Mostly a straightforward translation of the interpreter code, with a few
tricksy optimizations and fallbacks for rare paths.
2014-09-03 12:15:30 -07:00
Fiora
c72a133206 JIT: implement frsqrte
Mostly a straightforward translation of the interpreter code, with a few
tricksy optimizations and fallbacks for rare paths.
2014-09-03 11:21:04 -07:00
Ryan Houdek
1ad1a9062a Remove PowerPCState::DebugCount.
This value was "helpful" for debugging when the stack got corrupted.
Helpful that if gpr[1](Which is the stack pointer with PPC ABI) is zero then the interpreter would spam huge amounts of annoy text saying that we
managed to get in to a "corrupted" state.
This is incremented every instruction on the interpreter, or every block run on the JIT64....Only if debugging is enabled(JIT64 it is a const
variable)
The message is only outputted when interpreter is used and debugging is enabled.
2014-09-03 00:26:57 -05:00
Pierre Bourdon
ddb2aefedf Merge pull request #904 from FioraAeterna/dcbz
JIT64: try enabling dcbz again
2014-09-02 15:41:40 +02:00
Fiora
3aa40dab00 JIT64: optimize carry calculations
Omit carry calculations that get overwritten later in the block before they're
used. Very common in the case of srawix and friends.
2014-09-01 20:41:48 -07:00
Lioncash
1d706b2311 Get rid of C-style empty function parameter indicators 2014-08-30 15:23:48 -04:00
Fiora
6f617c4175 JIT64: try enabling dcbz again
This time, check the address carefully beforehand, since apparently some games
do horrible things like running it on non-RAM addresses, or at the very least
virtual addresses.
2014-08-29 12:19:58 -07:00
Ryan Houdek
0217fb2008 Merge pull request #843 from FioraAeterna/fprf
JIT: Initial FPRF support
2014-08-28 13:15:50 -05:00
Fiora
7e07acbf3f Fix another absent-minded typo in the fmul interpreter patch 2014-08-26 23:00:11 -07:00
Fiora
1a0a33518b Bugfixes for fmul rounding
Fix the places I forgot to add Force25Bit, and fix an incredibly silly typo bug
2014-08-26 21:37:45 -07:00
Fiora
7dbc623dc0 JIT: Initial FPRF support
Doesn't support all the FPSCR flags, just the FPRF ones.
Add PPCAnalyzer support to remove unnecessary FPRF calculations.

POV-ray benchmark with enableFPRF forced on for an extreme comparison:
Before: 1500s
After, fmul/fmadd only: 728s
After, all float: 753s

In real games that use FPRF, like F-Zero GX, FPRF previously cost a few percent
of total runtime.

Since FPRF is so much faster now, if enableFPRF is set, just do it for every
float instruction, not just fmul/fmadd like before. I don't know if this will
fix any games, but there's little good reason not to.
2014-08-26 10:57:03 -07:00
comex
a7752f49be Merge pull request #861 from comex/warnings
Fix warnings for OS X
2014-08-24 16:15:58 -04:00
comex
cf01f47b52 Fix bloody printf specifiers.
In particular, even in code that only runs on x86-64, you can't use
PRIx64 for size_t because, on OS X, one is unsigned long and the other
is unsigned long long and clang whines about the difference.  I guess
you could make a size_t specifier macro, but those are horribly ugly, so
I just used casting.

Anyone want to make a nice (and slow) template-based printf?

Now without bare 'unsigned'.
2014-08-24 15:56:41 -04:00