Thanks for making steady progress, Xiaoming!
Have you composite scores for Stefan Krause's benchmark set with your
patch included?
On Fri, Sep 19, 2008 at 10:45 AM, xiaoming gu <xiaoming.gu@gmail.com> wrote:
> Hi, all. I did something more for shladd=>LEA today. With the available MUL
> strength reduction,
> X*10 is reduced to (X<<2+X) <<1+0 and 0 is generated by a self XOR
> instruction (CASE 3).
> Actually this XOR is not necessay and could be eliminated in HIR2LIR pass.
> Following is the
> better instructions generated with the improve patch. Comparing with
> previous CASE 3, you may
> find XOR gone.
>
>
> CASE 4: MUL strength reduction  using LEA and taking care of 0
>
> I22: LEA t48(EDI):I_32,t47[v434(EBP)+v434(EBP)*t46(4)]:I_32 bcOff: 42 \l\
> I23: LEA t52(EDI):I_32,t51[t48(EDI)*t50(2)+t49(0)]:I_32 bcOff: 42 \l\
> I861: MOV v533[v521(ESP)+t532(24)]:I_32,t52(EDI):I_32 bcOff: 43 \l\
> I860: MOV v535[v521(ESP)+t534(28)]:I_32,t53(1):I_32 bcOff: 45 \l\
> I26: EmptyPseudoInst bcOff: 48 \l\
>
> CASE1 CASE2 CASE3 CASE4
> Time (msec) 6234 7688 5734 5704
> Normalized 1 1.233 0.920 0.915
>
>
> I'm going to submit the patch though it only brings small performance
> improvement (0.5%). Any
> comment is welcome. Thanks.
>
Xiaoming
>
>
On Wed, Sep 17, 2008 at 4:13 PM, XiaoFeng Li <xiaofeng.li@gmail.com> wrote:
>
Xiaoming, Thanks for the explanation.
>>
>> Thanks,
>> xiaofeng
>>
On Wed, Sep 17, 2008 at 3:35 PM, xiaoming gu <xiaoming.gu@gmail.com>
wrote:
>> wrote:
>> > The 7.9% improvement comes from the complex function (shift left+add)
>> and
>> > quick execution (1 cycle) of LEA with
>> > special hardware optimizations. In IA32, LEA is designed for computing
>> > address originally but not limited to that
>> > purpose. So we may use LEA LIR for shladd HIR for common arithmetic
>> > calculations.
>> >
>> > And in the available MUL strength reduction (multiplybyconstant.cpp),
>> there
>> > is some part of code implying to use
>> > LEA LIR for shladd HIR. But in later HIR2LIR pass, shladd HIR is
>> transformed
>> > to SAL and ADD LIRs which makes
>> > MUL strength reduction always with no improvement.
>> >
Thanks. Xiaoming
>> >
On Wed, Sep 17, 2008 at 11:16 AM, XiaoFeng Li <xiaofeng.li@gmail.com>
wrote:
>> >wrote:
>> >
On Wed, Sep 17, 2008 at 10:29 AM, Xiaoming Gu (JIRA) <jira@apache.org>
wrote:
>> >> wrote:
>> >> >
>> >> > [
>> >>
>> https://issues.apache.org/jira/browse/HARMONY5965?page=com.atlassian.jira.plugin.system.issuetabpanels:alltabpanel
>> ]
>> >> >
>> >> > Xiaoming Gu updated HARMONY5965:
>> >> > 
>> >> >
>> >> > Attachment: H5965V1.patch
>> >> >
>> >> > With this patch, shladd HIR could generate LEA LIR when the data is
I4
>> >> type and shiftleft bit is 1/2/3.
>> >> >
>> >> > Note: A new MemOpndKind "MemOpndKind_LEA" is created because we just
>> use
>> >> the memory operand in LEA LIR to do common arithmetic calculation not
>> for
>> >> real memory address computation. If we still use MemOpndKind_Heap, there
>> are
>> >> some failed verifications in debug version.
>> >> >
>> >> > Then I turned on MUL strength reduction and get the following
>> improvement
>> >> with a synthetic example.
>> >> >
>> >> > hotspot of source code:
>> >> > for(int i=0;i<times;i++) // times=2,000,000,000
>> >> > result = result*multiplier; //multiplier=10, which is
>> transformed
>> >> from x*10 to (((x<<2)+x)<<1)+0
>> >> >
>> >> > Following is the binary code generated for "result =
>> result*multiplier;".
>> >> >
>> >> > CASE 1: No MUL strength reduction  using IMUL
>> >> > I868: MOV s47(EDI):I_32,v426(ESI):I_32 \l\
>> >> > I867: MOV t351(EBP):I_32,t46(10):I_32 \l\
>> >> > I22: (ID:s16(EFLGS):U_32) =IMUL s47(EDI):I_32,t351(EBP):I_32 bcOff:
>> 42
>> >> \l\
>> >> > I866: MOV v527[v513(ESP)+t526(28)]:I_32,s47(EDI):I_32 bcOff: 43 \l\
>> >> > I865: MOV v529[v513(ESP)+t528(32)]:I_32,t48(1):I_32 bcOff: 45 \l\
>> >> > I25: EmptyPseudoInst bcOff: 48 \l\
>> >> >
>> >> > CASE 2: MUL strength reduction  using SAL and ADD
>> >> > I884: MOV s47(EBP):I_32,v438(ESI):I_32 \l\
>> >> > I23: (ID:s16(EFLGS):U_32) =SAL s47(EBP):I_32,t46(2):U_8 bcOff: 42
\l\
>> >> > I883: MOV s54(EDI):I_32,v438(ESI):I_32 \l\
>> >> > I24: (ID:s16(EFLGS):U_32) =ADD s54(EDI):I_32,s47(EBP):I_32 bcOff:
42
>> \l\
>> >> > I116: (AD:s54(EDI):I_32) =CopyPseudoInst/MOV (AU:s54(EDI):I_32) \l\
>> >> > I26: (ID:s16(EFLGS):U_32) =SAL s54(EDI):I_32,t51(1):U_8 bcOff: 42
\l\
>> >> > I117: (AD:s54(EDI):I_32) =CopyPseudoInst/MOV (AU:s54(EDI):I_32) \l\
>> >> > I27: (ID:s16(EFLGS):U_32) =ADD s54(EDI):I_32,t50(0):I_32 bcOff: 42
>> \l\
>> >> > I882: MOV v539[v525(ESP)+t538(28)]:I_32,s54(EDI):I_32 bcOff: 43 \l\
>> >> > I881: MOV v541[v525(ESP)+t540(32)]:I_32,t55(1):I_32 bcOff: 45 \l\
>> >> > I30: EmptyPseudoInst bcOff: 48 \l\
>> >> >
>> >> > CASE 3: MUL strength reduction  using LEA
>> >> > I22: LEA t48(EBP):I_32,t47[v436(ESI)+v436(ESI)*t46(4)]:I_32 bcOff:
42
>> >> \l\
>> >> > I868: (ID:s16(EFLGS):U_32) =XOR t361(EDI):I_32,t361(EDI):I_32 \l\
>> >> > I23: LEA t52(EDI):I_32,t51[t361(EDI)+t48(EBP)*t50(2)]:I_32 bcOff:
42
>> \l\
>> >> > I867: MOV v537[v523(ESP)+t536(28)]:I_32,t52(EDI):I_32 bcOff: 43 \l\
>> >> > I866: MOV v539[v523(ESP)+t538(32)]:I_32,t53(1):I_32 bcOff: 45 \l\
>> >> > I26: EmptyPseudoInst bcOff: 48 \l\
>> >> >
>> >> > CASE1 CASE2 CASE3
>> >> > Time (msec) 6234 7688 5734
>> >>
Good job! The improvement looks good. It is about 7.9%. Thanks.

Thanks,
xiaofeng
>> >>
>> >> Thanks,
>> >> xiaofeng
>> >>
>> >> > I'm going to spend more time for H5901 to adjust MUL strength
>> reduction.
>> >> >
>> >> >> [drlvm][jit]generate Mnemonic_LEA LIR for Op_Shladd HIR in IA32
>> >> >> 
>> >> >>
>> >> >> Key: HARMONY5965
>> >> >> URL:
>> https://issues.apache.org/jira/browse/HARMONY5965
>> >> >> Project: Harmony
>> >> >> Issue Type: Improvement
>> >> >> Components: DRLVM
>> >> >> Reporter: Xiaoming Gu
>> >> >> Attachments: H5965V1.patch
>> >> >>
>> >> >>
>> >> >> In IA32 there is a quick (1 cycle) LEA instruction for loading
>> effective
>> >> address. The function of LEA is a combination of shiftleft and
>> addition.
>> >> For example LEA dst, src, 2, 4 does dst=src<<2+4. It's usually used
but
>> not
>> >> limited in element address calculation for array.
>> >> >> In current Ia32InstCodeSelector.cpp, the function for translating
>> >> Op_Shladd HIR generates shl and add. Since LEA has the same semantic, we
>> >> could deploy it to improve performance.
>> >> >
>> >>
>> >>
>> >>
>
