hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-4730) AM crashes due to OOM while serving up map task completion events
Date Fri, 19 Oct 2012 05:14:11 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Lowe updated MAPREDUCE-4730:
----------------------------------

    Attachment: MAPREDUCE-4730.patch

New patch that attempts to scale the maximum number of events a reducer will ask for per RPC
call based on some fuzzy numbers.  It still keeps maxEventsToFetch between 100 and 10000 to
avoid extremes.

Since events appear to be just a little under 100 bytes each, the patch currently targets
around 300MB of memory on the AM for RPC response processing.  This can still be exceeded
given enough reducers, but the user should be able to bump up the AM memory size at that point
and buy quite a bit more reducers.

This patch also implements the do-not-wait-if-we-got-a-full-response logic to avoid wasting
time while trying to fetch all the completion events.

Still need to do some testing at scale, but quick touch-testing on a single-node cluster seems
to work so putting it out there for comment and Jenkins.
                
> AM crashes due to OOM while serving up map task completion events
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-4730
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4730
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.3
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Blocker
>         Attachments: MAPREDUCE-4730.patch, MAPREDUCE-4730.patch
>
>
> We're seeing a repeatable OOM crash in the AM for a task with around 30000 maps and 3000
reducers.  Details to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message