Return-Path: Delivered-To: apmail-harmony-dev-archive@www.apache.org Received: (qmail 21434 invoked from network); 19 Sep 2008 06:45:56 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 19 Sep 2008 06:45:56 -0000 Received: (qmail 90308 invoked by uid 500); 19 Sep 2008 06:45:51 -0000 Delivered-To: apmail-harmony-dev-archive@harmony.apache.org Received: (qmail 90280 invoked by uid 500); 19 Sep 2008 06:45:51 -0000 Mailing-List: contact dev-help@harmony.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@harmony.apache.org Delivered-To: mailing list dev@harmony.apache.org Received: (qmail 90269 invoked by uid 99); 19 Sep 2008 06:45:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Sep 2008 23:45:51 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of xiaoming.gu@gmail.com designates 64.233.182.187 as permitted sender) Received: from [64.233.182.187] (HELO nf-out-0910.google.com) (64.233.182.187) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Sep 2008 06:44:54 +0000 Received: by nf-out-0910.google.com with SMTP id c7so136744nfi.40 for ; Thu, 18 Sep 2008 23:45:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type:references; bh=+l1qIRe6ya/bURl4zr0m5eELMsA2CYtjv8fZWPEjjPU=; b=scNEQyUEKgVSODieHzRs+6jFhv4e3loOKqSel6CMcSpA5RtRet/UuEq/6CNATVCIJt 2EH0FX7T5246jCFvUoxCXfU3G/5qv/B+JVoTLj7i47iBmmVIx4IzmT+RGKyFuvi4IhSy pWFc8DMidX2/urQkytm7G4w2jNXK6JhqccHCs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:references; b=GimKvOKwqfgWi8R6vQoDjoyOvsDUTYDdLKDTSMjvj6vu+oweAX8pWo7PceG8yuzxaM e1iqaY7bnV2O7FpL8AU7MPYxMX6avh+Reorg4eVm+UxmMEAbNErIkULFjljIq6Nnrri8 SnQ/cLPBEKf4Ds1/q+MFVU0mvknqPXKjPgeaU= Received: by 10.210.92.11 with SMTP id p11mr6172839ebb.175.1221806708280; Thu, 18 Sep 2008 23:45:08 -0700 (PDT) Received: by 10.210.29.14 with HTTP; Thu, 18 Sep 2008 23:45:08 -0700 (PDT) Message-ID: <255079590809182345s6966bd13gc2692a3a0c785c68@mail.gmail.com> Date: Fri, 19 Sep 2008 14:45:08 +0800 From: "xiaoming gu" To: dev@harmony.apache.org Subject: Re: [jira] Updated: (HARMONY-5965) [drlvm][jit]generate Mnemonic_LEA LIR for Op_Shladd HIR in IA32 In-Reply-To: <9623c9a50809170113i225180b1j7d414223c1b6e536@mail.gmail.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_40925_17380661.1221806708271" References: <896435302.1220326724346.JavaMail.jira@brutus> <2073219440.1221618584345.JavaMail.jira@brutus> <9623c9a50809162016mdae87fas902dd1f308105453@mail.gmail.com> <255079590809170035k6e1e98b4g82adb3999a942481@mail.gmail.com> <9623c9a50809170113i225180b1j7d414223c1b6e536@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_40925_17380661.1221806708271 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Hi, all. I did something more for shladd=>LEA today. With the available MUL strength reduction, X*10 is reduced to (X<<2+X) <<1+0 and 0 is generated by a self XOR instruction (CASE 3). Actually this XOR is not necessay and could be eliminated in HIR2LIR pass. Following is the better instructions generated with the improve patch. Comparing with previous CASE 3, you may find XOR gone. CASE 4: MUL strength reduction - using LEA and taking care of 0 I22: LEA t48(EDI):I_32,t47[v434(EBP)+v434(EBP)*t46(4)]:I_32 bcOff: 42 \l\ I23: LEA t52(EDI):I_32,t51[t48(EDI)*t50(2)+t49(0)]:I_32 bcOff: 42 \l\ I861: MOV v533[v521(ESP)+t532(-24)]:I_32,t52(EDI):I_32 bcOff: 43 \l\ I860: MOV v535[v521(ESP)+t534(-28)]:I_32,t53(1):I_32 bcOff: 45 \l\ I26: EmptyPseudoInst bcOff: 48 \l\ CASE1 CASE2 CASE3 CASE4 Time (msec) 6234 7688 5734 5704 Normalized 1 1.233 0.920 0.915 I'm going to submit the patch though it only brings small performance improvement (0.5%). Any comment is welcome. Thanks. Xiaoming On Wed, Sep 17, 2008 at 4:13 PM, Xiao-Feng Li wrote: > Xiaoming, Thanks for the explanation. > > Thanks, > xiaofeng > > On Wed, Sep 17, 2008 at 3:35 PM, xiaoming gu > wrote: > > The 7.9% improvement comes from the complex function (shift left+add) > and > > quick execution (1 cycle) of LEA with > > special hardware optimizations. In IA32, LEA is designed for computing > > address originally but not limited to that > > purpose. So we may use LEA LIR for shladd HIR for common arithmetic > > calculations. > > > > And in the available MUL strength reduction (multiplybyconstant.cpp), > there > > is some part of code implying to use > > LEA LIR for shladd HIR. But in later HIR2LIR pass, shladd HIR is > transformed > > to SAL and ADD LIRs which makes > > MUL strength reduction always with no improvement. > > > > Thanks. -Xiaoming > > > > On Wed, Sep 17, 2008 at 11:16 AM, Xiao-Feng Li >wrote: > > > >> On Wed, Sep 17, 2008 at 10:29 AM, Xiaoming Gu (JIRA) > >> wrote: > >> > > >> > [ > >> > https://issues.apache.org/jira/browse/HARMONY-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > ] > >> > > >> > Xiaoming Gu updated HARMONY-5965: > >> > --------------------------------- > >> > > >> > Attachment: H5965-V1.patch > >> > > >> > With this patch, shladd HIR could generate LEA LIR when the data is I4 > >> type and shift-left bit is 1/2/3. > >> > > >> > Note: A new MemOpndKind "MemOpndKind_LEA" is created because we just > use > >> the memory operand in LEA LIR to do common arithmetic calculation not > for > >> real memory address computation. If we still use MemOpndKind_Heap, there > are > >> some failed verifications in debug version. > >> > > >> > Then I turned on MUL strength reduction and get the following > improvement > >> with a synthetic example. > >> > > >> > hotspot of source code: > >> > for(int i=0;i >> > result = result*multiplier; //multiplier=10, which is > transformed > >> from x*10 to (((x<<2)+x)<<1)+0 > >> > > >> > Following is the binary code generated for "result = > result*multiplier;". > >> > > >> > CASE 1: No MUL strength reduction - using IMUL > >> > I868: MOV s47(EDI):I_32,v426(ESI):I_32 \l\ > >> > I867: MOV t351(EBP):I_32,t46(10):I_32 \l\ > >> > I22: (ID:s16(EFLGS):U_32) =IMUL s47(EDI):I_32,t351(EBP):I_32 bcOff: > 42 > >> \l\ > >> > I866: MOV v527[v513(ESP)+t526(-28)]:I_32,s47(EDI):I_32 bcOff: 43 \l\ > >> > I865: MOV v529[v513(ESP)+t528(-32)]:I_32,t48(1):I_32 bcOff: 45 \l\ > >> > I25: EmptyPseudoInst bcOff: 48 \l\ > >> > > >> > CASE 2: MUL strength reduction - using SAL and ADD > >> > I884: MOV s47(EBP):I_32,v438(ESI):I_32 \l\ > >> > I23: (ID:s16(EFLGS):U_32) =SAL s47(EBP):I_32,t46(2):U_8 bcOff: 42 \l\ > >> > I883: MOV s54(EDI):I_32,v438(ESI):I_32 \l\ > >> > I24: (ID:s16(EFLGS):U_32) =ADD s54(EDI):I_32,s47(EBP):I_32 bcOff: 42 > \l\ > >> > I116: (AD:s54(EDI):I_32) =CopyPseudoInst/MOV (AU:s54(EDI):I_32) \l\ > >> > I26: (ID:s16(EFLGS):U_32) =SAL s54(EDI):I_32,t51(1):U_8 bcOff: 42 \l\ > >> > I117: (AD:s54(EDI):I_32) =CopyPseudoInst/MOV (AU:s54(EDI):I_32) \l\ > >> > I27: (ID:s16(EFLGS):U_32) =ADD s54(EDI):I_32,t50(0):I_32 bcOff: 42 > \l\ > >> > I882: MOV v539[v525(ESP)+t538(-28)]:I_32,s54(EDI):I_32 bcOff: 43 \l\ > >> > I881: MOV v541[v525(ESP)+t540(-32)]:I_32,t55(1):I_32 bcOff: 45 \l\ > >> > I30: EmptyPseudoInst bcOff: 48 \l\ > >> > > >> > CASE 3: MUL strength reduction - using LEA > >> > I22: LEA t48(EBP):I_32,t47[v436(ESI)+v436(ESI)*t46(4)]:I_32 bcOff: 42 > >> \l\ > >> > I868: (ID:s16(EFLGS):U_32) =XOR t361(EDI):I_32,t361(EDI):I_32 \l\ > >> > I23: LEA t52(EDI):I_32,t51[t361(EDI)+t48(EBP)*t50(2)]:I_32 bcOff: 42 > \l\ > >> > I867: MOV v537[v523(ESP)+t536(-28)]:I_32,t52(EDI):I_32 bcOff: 43 \l\ > >> > I866: MOV v539[v523(ESP)+t538(-32)]:I_32,t53(1):I_32 bcOff: 45 \l\ > >> > I26: EmptyPseudoInst bcOff: 48 \l\ > >> > > >> > CASE1 CASE2 CASE3 > >> > Time (msec) 6234 7688 5734 > >> > >> Good job! The improvement looks good. It is about 7.9%. Thanks. > >> > >> Thanks, > >> xiaofeng > >> > >> > I'm going to spend more time for H5901 to adjust MUL strength > reduction. > >> > > >> >> [drlvm][jit]generate Mnemonic_LEA LIR for Op_Shladd HIR in IA32 > >> >> --------------------------------------------------------------- > >> >> > >> >> Key: HARMONY-5965 > >> >> URL: > https://issues.apache.org/jira/browse/HARMONY-5965 > >> >> Project: Harmony > >> >> Issue Type: Improvement > >> >> Components: DRLVM > >> >> Reporter: Xiaoming Gu > >> >> Attachments: H5965-V1.patch > >> >> > >> >> > >> >> In IA32 there is a quick (1 cycle) LEA instruction for loading > effective > >> address. The function of LEA is a combination of shift-left and > addition. > >> For example LEA dst, src, 2, 4 does dst=src<<2+4. It's usually used but > not > >> limited in element address calculation for array. > >> >> In current Ia32InstCodeSelector.cpp, the function for translating > >> Op_Shladd HIR generates shl and add. Since LEA has the same semantic, we > >> could deploy it to improve performance. > >> > > >> > -- > >> > This message is automatically generated by JIRA. > >> > - > >> > You can reply to this email to add a comment to the issue online. > >> > > >> > > >> > >> > >> > >> -- > >> http://xiao-feng.blogspot.com > >> > > > > > > -- > http://xiao-feng.blogspot.com > ------=_Part_40925_17380661.1221806708271--