harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "xiaoming gu" <xiaoming...@gmail.com>
Subject Re: [jira] Updated: (HARMONY-5965) [drlvm][jit]generate Mnemonic_LEA LIR for Op_Shladd HIR in IA32
Date Wed, 17 Sep 2008 07:35:47 GMT
 The 7.9% improvement comes from the complex function (shift left+add) and
quick execution (1 cycle) of LEA with
special hardware optimizations. In IA32, LEA is designed for computing
address originally but not limited to that
purpose. So we may use LEA LIR for shladd HIR for common arithmetic
calculations.

And in the available MUL strength reduction (multiplybyconstant.cpp), there
is some part of code implying to use
LEA LIR for shladd HIR. But in later HIR2LIR pass, shladd HIR is transformed
to SAL and ADD LIRs which makes
MUL strength reduction always with no improvement.

Thanks. -Xiaoming

On Wed, Sep 17, 2008 at 11:16 AM, Xiao-Feng Li <xiaofeng.li@gmail.com>wrote:

> On Wed, Sep 17, 2008 at 10:29 AM, Xiaoming Gu (JIRA) <jira@apache.org>
> wrote:
> >
> >     [
> https://issues.apache.org/jira/browse/HARMONY-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
> >
> > Xiaoming Gu updated HARMONY-5965:
> > ---------------------------------
> >
> >    Attachment: H5965-V1.patch
> >
> > With this patch, shladd HIR could generate LEA LIR when the data is I4
> type and shift-left bit is 1/2/3.
> >
> > Note: A new MemOpndKind "MemOpndKind_LEA" is created because we just use
> the memory operand in LEA LIR to do common arithmetic calculation not for
> real memory address computation. If we still use MemOpndKind_Heap, there are
> some failed verifications in debug version.
> >
> > Then I turned on MUL strength reduction and get the following improvement
> with a synthetic example.
> >
> > hotspot of source code:
> >    for(int i=0;i<times;i++) // times=2,000,000,000
> >        result = result*multiplier; //multiplier=10, which is transformed
> from x*10 to (((x<<2)+x)<<1)+0
> >
> > Following is the binary code generated for "result = result*multiplier;".
> >
> > CASE 1: No MUL strength reduction - using IMUL
> > I868: MOV s47(EDI):I_32,v426(ESI):I_32 \l\
> > I867: MOV t351(EBP):I_32,t46(10):I_32 \l\
> > I22: (ID:s16(EFLGS):U_32) =IMUL s47(EDI):I_32,t351(EBP):I_32  bcOff: 42
> \l\
> > I866: MOV v527[v513(ESP)+t526(-28)]:I_32,s47(EDI):I_32  bcOff: 43 \l\
> > I865: MOV v529[v513(ESP)+t528(-32)]:I_32,t48(1):I_32  bcOff: 45 \l\
> > I25: EmptyPseudoInst  bcOff: 48 \l\
> >
> > CASE 2: MUL strength reduction - using SAL and ADD
> > I884: MOV s47(EBP):I_32,v438(ESI):I_32 \l\
> > I23: (ID:s16(EFLGS):U_32) =SAL s47(EBP):I_32,t46(2):U_8  bcOff: 42 \l\
> > I883: MOV s54(EDI):I_32,v438(ESI):I_32 \l\
> > I24: (ID:s16(EFLGS):U_32) =ADD s54(EDI):I_32,s47(EBP):I_32  bcOff: 42 \l\
> > I116: (AD:s54(EDI):I_32) =CopyPseudoInst/MOV (AU:s54(EDI):I_32) \l\
> > I26: (ID:s16(EFLGS):U_32) =SAL s54(EDI):I_32,t51(1):U_8  bcOff: 42 \l\
> > I117: (AD:s54(EDI):I_32) =CopyPseudoInst/MOV (AU:s54(EDI):I_32) \l\
> > I27: (ID:s16(EFLGS):U_32) =ADD s54(EDI):I_32,t50(0):I_32  bcOff: 42 \l\
> > I882: MOV v539[v525(ESP)+t538(-28)]:I_32,s54(EDI):I_32  bcOff: 43 \l\
> > I881: MOV v541[v525(ESP)+t540(-32)]:I_32,t55(1):I_32  bcOff: 45 \l\
> > I30: EmptyPseudoInst  bcOff: 48 \l\
> >
> > CASE 3: MUL strength reduction - using LEA
> > I22: LEA t48(EBP):I_32,t47[v436(ESI)+v436(ESI)*t46(4)]:I_32  bcOff: 42
> \l\
> > I868: (ID:s16(EFLGS):U_32) =XOR t361(EDI):I_32,t361(EDI):I_32 \l\
> > I23: LEA t52(EDI):I_32,t51[t361(EDI)+t48(EBP)*t50(2)]:I_32  bcOff: 42 \l\
> > I867: MOV v537[v523(ESP)+t536(-28)]:I_32,t52(EDI):I_32  bcOff: 43 \l\
> > I866: MOV v539[v523(ESP)+t538(-32)]:I_32,t53(1):I_32  bcOff: 45 \l\
> > I26: EmptyPseudoInst  bcOff: 48 \l\
> >
> >                               CASE1         CASE2           CASE3
> > Time (msec)        6234             7688                5734
>
> Good job!  The improvement looks good. It is about 7.9%. Thanks.
>
> Thanks,
> xiaofeng
>
> > I'm going to spend more time for H5901 to adjust MUL strength reduction.
> >
> >> [drlvm][jit]generate Mnemonic_LEA LIR for Op_Shladd HIR in IA32
> >> ---------------------------------------------------------------
> >>
> >>                 Key: HARMONY-5965
> >>                 URL: https://issues.apache.org/jira/browse/HARMONY-5965
> >>             Project: Harmony
> >>          Issue Type: Improvement
> >>          Components: DRLVM
> >>            Reporter: Xiaoming Gu
> >>         Attachments: H5965-V1.patch
> >>
> >>
> >> In IA32 there is a quick (1 cycle) LEA instruction for loading effective
> address. The function of LEA is a combination of shift-left and addition.
> For example LEA dst, src, 2, 4 does dst=src<<2+4. It's usually used but not
> limited in element address calculation for array.
> >> In current Ia32InstCodeSelector.cpp, the function for translating
> Op_Shladd HIR generates shl and add. Since LEA has the same semantic, we
> could deploy it to improve performance.
> >
> > --
> > This message is automatically generated by JIRA.
> > -
> > You can reply to this email to add a comment to the issue online.
> >
> >
>
>
>
> --
> http://xiao-feng.blogspot.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message