hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3402) AMScalability test of Sleep job with 100K 1-sec maps regressed into running very slowly
Date Thu, 08 Dec 2011 01:16:40 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164900#comment-13164900

Vinod Kumar Vavilapalli commented on MAPREDUCE-3402:

[~karams] had been extremely helpful in running various tests to hunt this down. And we finally
got some results after a couple of weeks of hard work.

Turns out that most of the issues are because we made a switch from 32 bit JVMs to 64 bit.
Using compressed references dramatically increased the AMs speed, and the job finishes in
around 30-35 mins. That is still a regression, but atleast the job finishes after the compressed-ops
setting and/or changing the jvm back to 32 bit.

Giving more heap to the 32 bit JVM, around 3GB, helps to finish the job in around 7-8 mins.
But that isn't something we want to do for all jobs. Reverting back to original speed definitely
means that AM is wasting away time in GCs. Some of the observations Sid made above may hint
at the root culprit.

Will file separate tickets to fix the inefficiencies.
> AMScalability test of Sleep job with 100K 1-sec maps regressed into running very slowly
> ---------------------------------------------------------------------------------------
>                 Key: MAPREDUCE-3402
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3402
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.1
> The world was rosier before October 19-25, [~karams] says.
> The 100K 1 second sleep job used to take around 800mins or 13-14 mins. It now runs till
45 mins and still manages to complete only about 45K tasks.
> One/more of the flurry of commits for 0.23.0 deserve(s) the blame.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message