hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4730) AM crashes due to OOM while serving up map task completion events
Date Thu, 18 Oct 2012 19:08:05 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479257#comment-13479257
] 

Jason Lowe commented on MAPREDUCE-4730:
---------------------------------------

A little more digging and I'm a bit more confident that this is a flow control problem in
the IPC layer.  I think the scenario goes like this:

# 1000's of reducers start asking for map completion events about the same time
# IPC Server.Handler thread fields a call off the queue, makes the call and gets 900K of data
# Handler thread queues up the response data to the connection, likely sees its the only thing
in the queue, and tries to push out the data
# It's too big to send it all without blocking so it pushes the remainder back onto the response
queue for the Responder thread to deal with and moves on to another call from the call queue
# Lots of reducers are queueing up in the call queue to get their 900K of data, and the handler
threads are processing them and pushing that data on the response queues as fast as they can
# Responder thread and/or socket I/O can't keep pace with the rate at which handlers are generating
900K responses and we eventually exhaust memory


                
> AM crashes due to OOM while serving up map task completion events
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-4730
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4730
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.3
>            Reporter: Jason Lowe
>            Priority: Blocker
>
> We're seeing a repeatable OOM crash in the AM for a task with around 30000 maps and 3000
reducers.  Details to follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message