hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5168) Reducer can OOM during shuffle because on-disk output stream not released
Date Mon, 06 May 2013 18:28:16 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13649954#comment-13649954

Vinod Kumar Vavilapalli commented on MAPREDUCE-5168:

+1, harmless patch and looks good..

I'm surprised this is the only 'leak', there may be others we can find by digging through
the code. Been a while since this code is touched..

Can commit it in a while or you can go ahead too. Tx.
> Reducer can OOM during shuffle because on-disk output stream not released
> -------------------------------------------------------------------------
>                 Key: MAPREDUCE-5168
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5168
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.7, 2.0.5-beta
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: MAPREDUCE-5168-branch-0.23.patch, MAPREDUCE-5168.patch
> If a reducer needs to shuffle a map output to disk, it opens an output stream and writes
the data to disk.  However it does not release the reference to the output stream within the
MapOutput, and the output stream can have a 128K buffer attached to it.  If enough of these
on-disk outputs are queued up waiting to be merged, it can cause the reducer to OOM during
the shuffle phase.  In one case I saw there were 1200 on-disk outputs queued up to be merged,
leading to an extra 150MB of pressure on the heap due to the output stream buffers that were
no longer necessary.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message