hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-2782) Earlier key-value buffer from MapTask.java is still referenced even though its not required anymore.
Date Tue, 05 Feb 2008 06:05:07 GMT
Earlier key-value buffer from MapTask.java is still referenced even though its not required
anymore.
----------------------------------------------------------------------------------------------------

                 Key: HADOOP-2782
                 URL: https://issues.apache.org/jira/browse/HADOOP-2782
             Project: Hadoop Core
          Issue Type: Bug
          Components: mapred
            Reporter: Amar Kamat
            Priority: Critical


Consider the following events for a map task
Before HADOOP-1965:
|| Stage || Description || Buffers used || Memory used||
|Stage-1 | MapOutputBuffer simply collects | KeyVal1 (by collect) | io.sort.mb|
|Stage-2 | KeyVal1 buffer is full and needs spilling so Sort-Spill starts | KeyVal1 (by Sort-Spill)
| io.sort.mb|
|Stage-3 | Sort-Spill finished | KeyVal1 (referenced by comparator ) | io.sort.mb|
|Stage-4 |  MapOutputBuffer starts collecting | KeyVal2(by collect) + KeyVal1(by comparator)
| 2*io.sort.mb|
|Stage-5 | KeyVal2 buffer is full and needs spilling so Sort-Spill starts  | KeyVal2 (by Sort-Spill)
| io.sort.mb|
So for the time duration between Stage-4 and Stage-5 the memory used becomes {{2 * io.sort.mb}}
which can be totally avoided by removing the comparator's reference to the earlier key-val
buffer. So the maximum memory usage can be clamped to {{io.sort.mb}}

After HADOOP-1965:
|| Stage || Description || Buffers used || Memory used ||
|Stage-1 | MapOutputBuffer simply collects | KeyVal1 (by collect)| io.sort.mb/2|
|Stage-2 | KeyVal1 buffer is full and needs spilling, so Sort-Spill starts in parallel | KeyVal1
(by Sort-Spill) | io.sort.mb/2|
|Stage-3 |  MapOutputBuffer simply collects + Sort-Spill | KeyVal2(by collect) + KeyVal1(by
Sort-Spill) | io.sort.mb|
|Stage-4 | MapOutputBuffer simply collects + Sort-Spill finishes, Sort-Impl's are closed but
the comparators still hold the reference to KeyVal1 buffer | KeyVal2 (by collect) + KeyVal1
(referred by comparator) | io.sort.mb|
|Stage-5 | KeyVal2 buffer is full and needs spilling, so Sort-Spill starts in parallel | KeyVal2
(by Sort-Spill) | io.sort.mb/2|
So for the time duration between Stage-4 and Stage-5 there is an unwanted reference to the
keyval buffer which prevents the GC from claiming it. However the maximum memory usage will
be {{io.sort.mb}}.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message