hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6169) MergeQueue should release reference to the current item from key and value at the end of the iteration to save memory.
Date Thu, 20 Nov 2014 20:53:34 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219977#comment-14219977
] 

Todd Lipcon commented on MAPREDUCE-6169:
----------------------------------------

At first I wasn't sure why this was necessary, since when the Merger completes, the task should
be done with all input and also be done.

It turns out that, in the reducer, when the input iterator (which may be a Merger) runs out
of values, we have the following code:
{code}
  public boolean nextKeyValue() throws IOException, InterruptedException {
    if (!hasMore) {
      key = null;
      value = null;
      return false;
    }
{code}

but we don't null out the 'input' member (the wrapped Merger).

So, if the Reducer implementation has a close() method that needs a lot of memory, it makes
sense to ensure that we drop the reference to the last K/V pair before running close(). One
option is to add 'input = null' in the code above, and the other option is to do what Zhihai
has done here. I don't think there's any particular reason that one is better than the other,
and given there's already a patch here, +1 on it.

> MergeQueue should release reference to the current item from key and value at the end
of the iteration to save memory.
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6169
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6169
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv2
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>            Priority: Minor
>         Attachments: MAPREDUCE-6169.000.patch
>
>
> MergeQueue should release reference to the current item from key and value at the end
of the iteration to save memory.
> these buffers referenced by key and value can be large, which may cause an OOM error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message