JIT: Initial FPRF support

Doesn't support all the FPSCR flags, just the FPRF ones.
Add PPCAnalyzer support to remove unnecessary FPRF calculations.

POV-ray benchmark with enableFPRF forced on for an extreme comparison:
Before: 1500s
After, fmul/fmadd only: 728s
After, all float: 753s

In real games that use FPRF, like F-Zero GX, FPRF previously cost a few percent
of total runtime.

Since FPRF is so much faster now, if enableFPRF is set, just do it for every
float instruction, not just fmul/fmadd like before. I don't know if this will
fix any games, but there's little good reason not to.
This commit is contained in:
Fiora
2014-08-20 02:22:07 -07:00
parent f52888d3ec
commit 7dbc623dc0
13 changed files with 222 additions and 91 deletions

View File

@@ -453,6 +453,10 @@ void PPCAnalyzer::SetInstructionStats(CodeBlock *block, CodeOp *code, GekkoOPInf
else
code->outputCR1 = (opinfo->flags & FL_SET_CR1) ? true : false;
code->wantsFPRF = (opinfo->flags & FL_READ_FPRF) ? true : false;
code->outputFPRF = (opinfo->flags & FL_SET_FPRF) ? true : false;
code->canEndBlock = (opinfo->flags & FL_ENDBLOCK) ? true : false;
int numOut = 0;
int numIn = 0;
if (opinfo->flags & FL_OUT_A)
@@ -710,24 +714,25 @@ u32 PPCAnalyzer::Analyze(u32 address, CodeBlock *block, CodeBuffer *buffer, u32
}
// Scan for CR0 dependency
// assume next block wants CR0 to be safe
// assume next block wants flags to be safe
bool wantsCR0 = true;
bool wantsCR1 = true;
bool wantsPS1 = true;
bool wantsFPRF = true;
for (int i = block->m_num_instructions - 1; i >= 0; i--)
{
if (code[i].outputCR0)
wantsCR0 = false;
if (code[i].outputCR1)
wantsCR1 = false;
if (code[i].outputPS1)
wantsPS1 = false;
wantsCR0 |= code[i].wantsCR0;
wantsCR1 |= code[i].wantsCR1;
wantsPS1 |= code[i].wantsPS1;
wantsCR0 |= code[i].wantsCR0 || code[i].canEndBlock;
wantsCR1 |= code[i].wantsCR1 || code[i].canEndBlock;
wantsPS1 |= code[i].wantsPS1 || code[i].canEndBlock;
wantsFPRF |= code[i].wantsFPRF || code[i].canEndBlock;
code[i].wantsCR0 = wantsCR0;
code[i].wantsCR1 = wantsCR1;
code[i].wantsPS1 = wantsPS1;
code[i].wantsFPRF = wantsFPRF;
wantsCR0 &= !code[i].outputCR0;
wantsCR1 &= !code[i].outputCR1;
wantsPS1 &= !code[i].outputPS1;
wantsFPRF &= !code[i].outputFPRF;
}
return address;
}