hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravi Prakash <ravihad...@gmail.com>
Subject Re: Regarding WholeInputFileFormat Java Heap Size error
Date Thu, 12 May 2016 17:00:11 GMT
Shubh! You can perhaps introduce an artificial delay in your map task and
then take a JAVA heap dump of the MapTask JVM to analyze where the memory
is going. Its hard to speculate otherwise.

On Wed, May 11, 2016 at 10:15 PM, Shubh hadoopExp <shubhhadoopexp@gmail.com>

> Hi All,
> While reading input from directory recursively consisting of files of size
> 30Mb, using WholeFileInputFormat and WholeFileRecordReader, I am running
> into JavaHeapSize error for even a very small file of 30MB. By default the
> *mapred.child.java.opts* is set to -*Xmx200m* and should be sufficient
> enough to run atleast 30MB files present in the directory.
> The input is a normal random words in file. Each Map is given a single
> file of size 30MB and I am reading value as the content of the whole file.
> And running normal word count.
> If I increase the *mapred.child.java.opts *size to higher value the
> applications runs successfully. But it would be great if anyone can suggest
> me why *mapred.child.java.opts*  which is currently 200Mb default for
> task is not sufficient for 30 MB file, as this means Hadoop MapReduce is
> consuming a lot of heap size and out of 200MB it doesn't even use 30Mb to
> process the task? Also, is there any other way to read the a large Whole
> file as a input to a single Map, meaning every Map gets a whole file to
> process?
> -Shubh

View raw message