hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Travis Woodruff <apa...@yahoo.com>
Subject Possible memory "leak" in MapTask$MapOutputBuffer
Date Tue, 05 Feb 2008 00:41:31 GMT
I have been using Hadoop for a couple of months now, and I recently moved to an x86_64 platform.
When I ran some jobs that I've run previously on the 32-bit cluster, I got OutOfMemoryError
on a large number of map tasks. I initially chalked it up to 64-bit object overhead being
a bit higher and increased my task process heap size from 512M to 650M. After increasing it,
the OOMEs have decreased, but I'm still seeing them occasionally, so I did some poking around
in a heap snapshot, and I think I've found a potential problem with the way the sort buffer
is being cleaned up.

After MapOutputBuffer calls, sortAndSpillToDisk(), it iterates over all the sortImpls, and
calls close(). This close nulls the keyValBuffer member of BasicTypeSorterBase; however, it
does not clear the references in the sorter's comparator (WritableComparator.buffer). Because
of this, I think it's possible for the old buffer (or even multiple old buffers) to not be
GC'd. If one or more partiitions' sorters are used for sorting a buffer's contents but not
for the next, the comparators for the sorters for the first set of partitions will hold a
reference to the first buffer even after the new buffer is created.

Please let me know if you agree with this assessment. If this is indeed a problem it could
(at least partially) explain some of the mysterious memory usage discussed in HADOOP-2751.


Thanks,
Travis




      ____________________________________________________________________________________
Looking for last minute shopping deals?  
Find them fast with Yahoo! Search.  http://tools.search.yahoo.com/newsearch/category.php?category=shopping

Mime
View raw message