hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Travis Woodruff <apa...@yahoo.com>
Subject Re: Possible memory "leak" in MapTask$MapOutputBuffer
Date Tue, 05 Feb 2008 17:27:36 GMT
I think it's possible to retain even more than two copies of the keyval buffer.

This can happen because the comparator's buffer is set only when a comparison is performed,
so if no data exists for a partition in a spill cycle, the partition's comparator will retain
the buffer from a previous spill.

Take this scenario:

We have 3 partitions: A,B,C
a keyval buffer that can hold 3 keys
and an input data set that generates output keys in the following sequence: AAABBBCCC

A
A
A
-- Spill occurs here. A's comparator has keyval buffer 1.
B
B
B
-- Spill occurs here. B's comparator has keyval buffer 2. A's comparator has keyval buffer
1.
C
C
C
-- Spill occurs here. C's comparator has keyval buffer 3. B's A's comparator has keyval buffer
2. A's comparator has keyval buffer 1.


This is fairly unlikely, but I think it happening to me occasionally because I'm seeing OOMEs
I can't explain any other way. My heap is more than large enough to support two 100M buffers.

FYI, I added code to clear the comparator's buffer (see patch below), and a job that was failing
with 650M heaps now succeeds with 512M.



Travis


Index: src/java/org/apache/hadoop/io/WritableComparator.java
===================================================================
--- src/java/org/apache/hadoop/io/WritableComparator.java    (revision 618649)
+++ src/java/org/apache/hadoop/io/WritableComparator.java    (working copy)
@@ -78,6 +78,11 @@
     }
   }
 
+  /** Free up memory used by the internal DataInputBuffer. */
+  public void clearBuffer() {
+      buffer.reset(new byte[] {}, 0, 0);
+  }
+  
   /** Optimization hook.  Override this to make SequenceFile.Sorter's scream.
    *
    * <p>The default implementation reads the data into two {@link
Index: src/java/org/apache/hadoop/mapred/BasicTypeSorterBase.java
===================================================================
--- src/java/org/apache/hadoop/mapred/BasicTypeSorterBase.java    (revision 618649)
+++ src/java/org/apache/hadoop/mapred/BasicTypeSorterBase.java    (working copy)
@@ -124,6 +124,7 @@
     //release the large key-value buffer so that the GC, if necessary,
     //can collect it away
     keyValBuffer = null;
+    comparator.clearBuffer();
   }
   //A compare method that references the keyValBuffer through the indirect
   //pointers




----- Original Message ----
From: Amar Kamat <amarrk@yahoo-inc.com>
To: core-user@hadoop.apache.org
Sent: Tuesday, February 5, 2008 12:08:48 AM
Subject: Re: Possible memory "leak" in MapTask$MapOutputBuffer


Hi,
Yes, 
you 
are 
correct. 
The 
reference 
to 
the 
old 
keyval 
buffers 
are 
still 
there 
even 
after 
the 
buffers 
are 
re-initialized 
but 
the 
reference 
is 
there 
just 
between 
the 
consecutive 
spills. 
The 
scenario 
before 
HADOOP-1965 
was 
that 
the 
memory 
used 
for 
one 
sort-spill 
phase 
is 
io.sort.mb 
causing 
the 
max 
memory 
usage 
to 
be 
(2 
* 
io.sort.mb). 
Post 
HADOOP-1965, 
the 
total 
memory 
used 
for 
once 
sort-spill 
phase 
is 
io.sort.mb/2, 
the 
max 
memory 
usage 
is 
io.sort.mb 
and 
the 
time 
duration 
between 
two 
consecutive 
spills 
is 
also 
reduced 
since 
they 
happen 
in 
parallel. 
Thanks 
for 
pointing 
it 
out. 
I 
have 
opened 
HADOOP-2782 
addressing 
the 
same.
Amar
Travis 
Woodruff 
wrote:
> 
Well, 
this 
is 
what 
I 
get 
for 
not 
doing 
my 
homework 
first.
>
> 
I 
pulled 
down 
the 
latest 
code 
from 
trunk, 
and 
it 
looks 
like 
the 
updates 
for 
HADOOP-1965 
have 
changed 
this 
code 
significantly. 
>From 
what 
I 
can 
tell, 
these 
changes 
have 
removed 
the 
issue; 
however, 
the 
problem 
still 
exists 
in 
the 
0.15 
branch.
>
>
> 
Travis
>
> 
----- 
Original 
Message 
----
> 
From: 
Travis 
Woodruff 
<apalwe@yahoo.com>
> 
To: 
core-user@hadoop.apache.org
> 
Sent: 
Monday, 
February 
4, 
2008 
6:41:31 
PM
> 
Subject: 
Possible 
memory 
"leak" 
in 
MapTask$MapOutputBuffer
>

<snip for goofy formatting>



      ____________________________________________________________________________________
Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs

Mime
View raw message