harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksey Shipilev" <aleksey.shipi...@gmail.com>
Subject Re: discussion for H5826
Date Sun, 12 Oct 2008 20:02:49 GMT
Hi, Xiaoming!

I see you're deeply interested in Jitrino :) I'm actually surprised
that LIR is so clean... Can you show full LIR dump for
unsignedMultAddAdd? But now, let's see the refined version of your LIR
dump:

mov [esp-100], eax
mov [esp-104], eax
xor edx, edx
mov eax, [esp-100]
mul edx, eax, edx
mov ebx, eax
xor eax, eax
mov edi, [esp-24]
mul edx, eax, edi
add ebx, eax
mov eax, [esp-104]
mul edx, eax, edi
mov [esp-100], eax
add edx, ebx

...after noticing that "xor reg,reg" === "reg = 0" and propagating zero:

mov [esp-100], eax
mov [esp-104], eax
mov eax, [esp-100]
mov edi, [esp-24]
mov eax, [esp-104]
mul edx, eax, edi
mov [esp-100], eax

...and some more sweep:

mov edi, [esp-24]
mul edx, eax, edi
mov [esp-100], eax

Looks pretty good for me.

Thanks,
Aleksey.

On Sat, Oct 11, 2008 at 8:44 AM, xiaoming gu <xiaoming.gu@gmail.com> wrote:
> Hi, all. I find the benefits of this patch come from changing a 64-bit MUL
> (3 32-bit MUL and 2 32-bit ADD) to a 32-bit MUL. At present, the
> optimization is done as a magic replacement, which is not a common way to
> generate code. (https://issues.apache.org/jira/browse/HARMONY-5826)
>
> Assume A and B are both 64-bit operands and we are doing A*B. In 32-bit
> machine, the MUL operation is usually translated to (High 32-bit of A)*(Low
> 32-bit of b)+(Low 32-bit of A)*(High 32-bit of B)+(Low 32-bit of A)*(Low
> 32-bit of A). But when we know High 32-bit of A and B are both 0, only (Low
> 32-bit of A)*(Low 32-bit of A) needed.
>
> Following are the HIR and LIR for result = (a & ffffffffL) * (b & ffffffffL)
> + (c & ffffffffL) + (d & ffffffffL) without this patch. We can do the
> optimization in HIR simplifier or LIR peephole. I'm not sure whether
> changing int64 operation to int32 operation will bring overhead for 64-bit
> machine. I think maybe peephole is a better place. I find there is no
> peephole optimization for XOR. If JIT could find out the result of XOR is 0,
> then propagates the 0 to MUL and related MUL is eliminated. The left problem
> is whether there is sufficient data-flow analysis in LIR to do the
> propagation and elimination.
> =====HIR=====
>  I42:ldci8     #4294967295 -) t38:int64
>  I43:and       t37, t38 -) t39:int64
>  I44:convi8  g23 -) t40:int64
>  I45:and       t40, t38 -) t41:int64
>  I46:mul   t39, t41 -) t42:int64
> =====LIR=====
>    238B02A6 I329: MOV s286[v208(ESP)+t285(-100)]:I_32,v19(EAX):I_32
>    238B02AA I328: MOV t291[v208(ESP)+t290(-104)]:I_32,v19(EAX):I_32
>    238B02AE I327: (ID:s8(EFLGS):U_32) =XOR t206(EDX):I_32,t206(EDX):I_32
>    238B02B0 I326: MOV s292(EAX):I_32,s286[v208(ESP)+t285(-100)]:I_32
>    238B02B4 I70: (ID:s8(EFLGS):U_32) =MUL
> s139(EDX):I_32,s292(EAX):I_32,t206(EDX):I_32
>    238B02B6 I325: MOV s140(EBX):I_32,s292(EAX):I_32
>    238B02B8 I324: (ID:s8(EFLGS):U_32) =XOR s292(EAX):I_32,s292(EAX):I_32
>    238B02BA I323: MOV t289(EDI):I_32,v217[v208(ESP)+t216(-24)]:I_32
>    238B02BE I73: (ID:s8(EFLGS):U_32) =MUL
> s139(EDX):I_32,s292(EAX):I_32,t289(EDI):I_32
>    238B02C0 I74: (ID:s8(EFLGS):U_32) =ADD s140(EBX):I_32,s292(EAX):I_32
>    238B02C2 I322: MOV v19(EAX):I_32,t291[v208(ESP)+t290(-104)]:I_32
>    DEADBEEF I75: (AD:s293(EAX):I_32) =CopyPseudoInst/MOV (AU:v19(EAX):I_32)
>
>    238B02C6 I76: (ID:s8(EFLGS):U_32) =MUL
> s139(EDX):I_32,s293(EAX):I_32,t289(EDI):I_32
>    238B02C8 I321: MOV s286[v208(ESP)+t285(-100)]:I_32,s293(EAX):I_32
>    238B02CC I77: (ID:s8(EFLGS):U_32) =ADD s139(EDX):I_32,s140(EBX):I_32
>
> Any comments? Thanks.
>
> Xiaoming
>

Mime
View raw message