hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-5168) Reducer can OOM during shuffle because on-disk output stream not released
Date Tue, 14 May 2013 00:53:18 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vinod Kumar Vavilapalli updated MAPREDUCE-5168:
-----------------------------------------------

    Target Version/s: 0.23.8  (was: 2.0.5-beta, 0.23.8)
    
> Reducer can OOM during shuffle because on-disk output stream not released
> -------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5168
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5168
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.7
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: MAPREDUCE-5168-branch-0.23.patch, MAPREDUCE-5168.patch
>
>
> If a reducer needs to shuffle a map output to disk, it opens an output stream and writes
the data to disk.  However it does not release the reference to the output stream within the
MapOutput, and the output stream can have a 128K buffer attached to it.  If enough of these
on-disk outputs are queued up waiting to be merged, it can cause the reducer to OOM during
the shuffle phase.  In one case I saw there were 1200 on-disk outputs queued up to be merged,
leading to an extra 150MB of pressure on the heap due to the output stream buffers that were
no longer necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message