harmony-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mikhail Fursov (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HARMONY-4620) [drlvm][jit] Long return path for floating point values in calling convention
Date Wed, 12 Mar 2008 12:00:47 GMT

     [ https://issues.apache.org/jira/browse/HARMONY-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mikhail Fursov updated HARMONY-4620:
------------------------------------

    Attachment: return_xmm_4.patch

return_xmm_4.patch: catching latest changes in file structure.

I starting to test this patch and going to commit it in a day if no problems found.

> [drlvm][jit] Long return path for floating point values in calling convention
> -----------------------------------------------------------------------------
>
>                 Key: HARMONY-4620
>                 URL: https://issues.apache.org/jira/browse/HARMONY-4620
>             Project: Harmony
>          Issue Type: Improvement
>          Components: DRLVM
>         Environment: appropriate for for Intel architecture
>            Reporter: Naumova Natalya 
>            Assignee: Mikhail Fursov
>         Attachments: return_xmm.patch, return_xmm_2.patch, return_xmm_3.patch, return_xmm_4.patch
>
>
> DRLVM has too long return path when the return value is floatin point. The reason is
FPU usage together with SSE instructions in calling convention: we have "SSE -> mem ->
FPU -> (return) mem -> SSE"; return (double) value first is calculated on xmm* registers,
then copied to mem, then is put on FPU stack, then extracted from this stack (in calling proc)
to memory again, then again calculation is happened in xmm* registers (SSE instructions).
This issue overrides the improvement with loop unrolling, overhead from the parameters passing
with this calling convention overrides the loop body doubling speed-up. When you increase
"arg.optimizer.unroll.medium_loop_unroll_count" option in method where return value is double
and it is in loop, then you'll have degradation (example - MonteCarlo benchmark in SciMark).
> Can we avoid using FPU with SSE in this case?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message