hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Seth (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-4946) Type conversion of map completion events leads to performance problems with large jobs
Date Tue, 22 Jan 2013 19:10:13 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Siddharth Seth updated MAPREDUCE-4946:
--------------------------------------

       Resolution: Fixed
    Fix Version/s: 0.23.7
                   2.0.3-alpha
     Hadoop Flags: Reviewed
           Status: Resolved  (was: Patch Available)

Committed to trunk, branch-2 and branch-0.23. Thanks Jason
                
> Type conversion of map completion events leads to performance problems with large jobs
> --------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4946
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4946
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 2.0.2-alpha, 0.23.5
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>             Fix For: 2.0.3-alpha, 0.23.7
>
>         Attachments: MAPREDUCE-4946-branch-0.23.patch, MAPREDUCE-4946.patch
>
>
> We've seen issues with large jobs (e.g.: 13,000 maps and 3,500 reduces) where reducers
fail to connect back to the AM after being launched due to connection timeout.  Looking at
stack traces of the AM during this time we see a lot of IPC servers stuck waiting for a lock
to get the application ID while type converting the map completion events.  What's odd is
that normally getting the application ID should be very cheap, but in this case we're type-converting
thousands of map completion events for *each* reducer connecting.  That means we end up type-converting
the map completion events over 45 million times during the lifetime of the example job (13,000
* 3,500).
> We either need to make the type conversion much cheaper (i.e.: lockless or at least read-write
locked) or, even better, store the completion events in a form that does not require type
conversion when serving them up to reducers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message