Aleksey, please see the attachments which are the source code, the HIR
before HIR2LIR and the final LIR.
With more thoughts, I plan to do the optimization in HIR simplifier first
because it looks much easier than in LIR peephole. When I finish that step,
I'll try LIR peephole later.
Xiaoming
On Mon, Oct 13, 2008 at 4:02 AM, Aleksey Shipilev <
aleksey.shipilev@gmail.com> wrote:
> Hi, Xiaoming!
>
> I see you're deeply interested in Jitrino :) I'm actually surprised
> that LIR is so clean... Can you show full LIR dump for
> unsignedMultAddAdd? But now, let's see the refined version of your LIR
> dump:
>
> mov [esp100], eax
> mov [esp104], eax
> xor edx, edx
> mov eax, [esp100]
> mul edx, eax, edx
> mov ebx, eax
> xor eax, eax
> mov edi, [esp24]
> mul edx, eax, edi
> add ebx, eax
> mov eax, [esp104]
> mul edx, eax, edi
> mov [esp100], eax
> add edx, ebx
>
> ...after noticing that "xor reg,reg" === "reg = 0" and propagating zero:
>
> mov [esp100], eax
> mov [esp104], eax
> mov eax, [esp100]
> mov edi, [esp24]
> mov eax, [esp104]
> mul edx, eax, edi
> mov [esp100], eax
>
> ...and some more sweep:
>
> mov edi, [esp24]
> mul edx, eax, edi
> mov [esp100], eax
>
> Looks pretty good for me.
>
> Thanks,
> Aleksey.
>
> On Sat, Oct 11, 2008 at 8:44 AM, xiaoming gu <xiaoming.gu@gmail.com>
> wrote:
> > Hi, all. I find the benefits of this patch come from changing a 64bit
> MUL
> > (3 32bit MUL and 2 32bit ADD) to a 32bit MUL. At present, the
> > optimization is done as a magic replacement, which is not a common way to
> > generate code. (https://issues.apache.org/jira/browse/HARMONY5826)
> >
> > Assume A and B are both 64bit operands and we are doing A*B. In 32bit
> > machine, the MUL operation is usually translated to (High 32bit of
> A)*(Low
> > 32bit of b)+(Low 32bit of A)*(High 32bit of B)+(Low 32bit of A)*(Low
> > 32bit of A). But when we know High 32bit of A and B are both 0, only
> (Low
> > 32bit of A)*(Low 32bit of A) needed.
> >
> > Following are the HIR and LIR for result = (a & ffffffffL) * (b &
> ffffffffL)
> > + (c & ffffffffL) + (d & ffffffffL) without this patch. We can do the
> > optimization in HIR simplifier or LIR peephole. I'm not sure whether
> > changing int64 operation to int32 operation will bring overhead for
> 64bit
> > machine. I think maybe peephole is a better place. I find there is no
> > peephole optimization for XOR. If JIT could find out the result of XOR is
> 0,
> > then propagates the 0 to MUL and related MUL is eliminated. The left
> problem
> > is whether there is sufficient dataflow analysis in LIR to do the
> > propagation and elimination.
> > =====HIR=====
> > I42:ldci8 #4294967295 ) t38:int64
> > I43:and t37, t38 ) t39:int64
> > I44:convi8 g23 ) t40:int64
> > I45:and t40, t38 ) t41:int64
> > I46:mul t39, t41 ) t42:int64
> > =====LIR=====
> > 238B02A6 I329: MOV s286[v208(ESP)+t285(100)]:I_32,v19(EAX):I_32
> > 238B02AA I328: MOV t291[v208(ESP)+t290(104)]:I_32,v19(EAX):I_32
> > 238B02AE I327: (ID:s8(EFLGS):U_32) =XOR t206(EDX):I_32,t206(EDX):I_32
> > 238B02B0 I326: MOV s292(EAX):I_32,s286[v208(ESP)+t285(100)]:I_32
> > 238B02B4 I70: (ID:s8(EFLGS):U_32) =MUL
> > s139(EDX):I_32,s292(EAX):I_32,t206(EDX):I_32
> > 238B02B6 I325: MOV s140(EBX):I_32,s292(EAX):I_32
> > 238B02B8 I324: (ID:s8(EFLGS):U_32) =XOR s292(EAX):I_32,s292(EAX):I_32
> > 238B02BA I323: MOV t289(EDI):I_32,v217[v208(ESP)+t216(24)]:I_32
> > 238B02BE I73: (ID:s8(EFLGS):U_32) =MUL
> > s139(EDX):I_32,s292(EAX):I_32,t289(EDI):I_32
> > 238B02C0 I74: (ID:s8(EFLGS):U_32) =ADD s140(EBX):I_32,s292(EAX):I_32
> > 238B02C2 I322: MOV v19(EAX):I_32,t291[v208(ESP)+t290(104)]:I_32
> > DEADBEEF I75: (AD:s293(EAX):I_32) =CopyPseudoInst/MOV
> (AU:v19(EAX):I_32)
> >
> > 238B02C6 I76: (ID:s8(EFLGS):U_32) =MUL
> > s139(EDX):I_32,s293(EAX):I_32,t289(EDI):I_32
> > 238B02C8 I321: MOV s286[v208(ESP)+t285(100)]:I_32,s293(EAX):I_32
> > 238B02CC I77: (ID:s8(EFLGS):U_32) =ADD s139(EDX):I_32,s140(EBX):I_32
> >
> > Any comments? Thanks.
> >
> > Xiaoming
> >
>
