harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Rogers <rogers.em...@gmail.com>
Subject Re: [jira] Updated: (HARMONY-5965) [drlvm][jit]generate Mnemonic_LEA LIR for Op_Shladd HIR in IA32
Date Fri, 19 Sep 2008 12:24:00 GMT
xiaoming gu wrote:
> Hi, all. I did something more for shladd=>LEA today. With the available MUL
> strength reduction,
> X*10 is reduced to (X<<2+X) <<1+0 and 0 is generated by a self XOR
> instruction (CASE 3).
> Actually this XOR is not necessay and could be eliminated in HIR2LIR pass.
> Following is the
> better instructions generated with the improve patch. Comparing with
> previous CASE 3, you may
> find XOR gone.
>
>
> CASE 4: MUL strength reduction - using LEA and taking care of 0
>
> I22: LEA t48(EDI):I_32,t47[v434(EBP)+v434(EBP)*t46(4)]:I_32 bcOff: 42 \l\
> I23: LEA t52(EDI):I_32,t51[t48(EDI)*t50(2)+t49(0)]:I_32 bcOff: 42 \l\
> I861: MOV v533[v521(ESP)+t532(-24)]:I_32,t52(EDI):I_32 bcOff: 43 \l\
> I860: MOV v535[v521(ESP)+t534(-28)]:I_32,t53(1):I_32 bcOff: 45 \l\
> I26: EmptyPseudoInst bcOff: 48 \l\
>
>                     CASE1      CASE2     CASE3      CASE4
> Time (msec)   6234         7688          5734          5704
> Normalized     1              1.233         0.920         0.915
>
>
> I'm going to submit the patch though it only brings small performance
> improvement (0.5%). Any
> comment is welcome. Thanks.
>
> Xiaoming
>   
Hi Xiaoming,

inspired by all this discussion I rehashed some of this code in Jikes 
RVM, the results are here [1]. We have a BURS instruction selector so we 
don't generate ShiftAdd operations instead the tree pattern matcher will 
combine a shift and add into an address [2] that we then turn into an 
LEA [3]. Anyway for a multiply by 10 we now produce:

3           int_move                t3si(I) = 0
3           int_shl                 t4si(I) = l0si(I,d), 1
3           int_add                 t5si(I) = t3si(I), t4si(I)
3           int_shl                 t6si(I) = l0si(I,d), 3
3           int_add                 t7si(I) = t5si(I), t6si(I)
3           int_move                t2si(I) = t7si(I)

that becomes:

3           ia32_lea                t6si(I) = <0+[l0si(I,d)*8]>DW
3           ia32_lea                t7si(I) = <[t6si(I)]+[l0si(I,d)*2]>DW

Rather than (x<<2+x)<<1 we produce (x<<3)+(x<<1) as I believe this

approach gives better opportunity for ILP. A better example is 100 for 
which we generate:

3           ia32_lea                EDX(I) = <0+[EAX(I,d)*4]>DW
3           ia32_lea                EDX(I) = <[EDX(I)]+[EDX(I)*8]>DW
3           ia32_shl                EAX(I) AF CF OF PF SF ZF <-- 6
3           ia32_add                EAX(I) AF CF OF PF SF ZF <-- EDX(I)

that is ((x*4)*(1+8))+x*64. If we generated the shift and adds for the 
*64 from the preceding shift then we end up losing the add in the 2nd 
LEA and having to generate an extra add to compensate, ie:

3           ia32_lea               t1 = l0*4
3           ia32_lea               t2 = t1*8
3           ia32_lea               t3 = t2*2 + t2
3           ia32_add              t1 = t3 + t1

I believe we might as well do the shift by 6 and add as at least that 
can be done in parallel to the LEAs. I mention this as I think this may 
be what your multiply by 100 should look like. It'd be nice if Jikes RVM 
were doing simplification of constant division and of course for 
multiplication we're not considering subtractions to create the desired 
result (help welcome :-) ).

Regards,
Ian

[1] 
http://jikesrvm.svn.sourceforge.net/viewvc/jikesrvm/rvmroot/trunk/rvm/src/org/jikesrvm/compilers/opt/Simplifier.java?revision=15001&view=markup#l_1379
[2] 
http://jikesrvm.svn.sourceforge.net/viewvc/jikesrvm/rvmroot/trunk/rvm/src-generated/opt-burs/ia32/IA32.rules?view=markup#l_229
[3] 
http://jikesrvm.svn.sourceforge.net/viewvc/jikesrvm/rvmroot/trunk/rvm/src-generated/opt-burs/ia32/IA32.rules?view=markup#l_297

Mime
View raw message