hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shubh hadoopExp <shubhhadoop...@gmail.com>
Subject Regarding WholeInputFileFormat Java Heap Size error
Date Tue, 10 May 2016 22:07:30 GMT

Hi All,

While reading input from directory recursively consisting of files of size 30Mb, using WholeFileInputFormat
and WholeFileRecordReader, I am running into JavaHeapSize error for even a very small file
of 30MB. By default the mapred.child.java.opts is set to -Xmx200m and should be sufficient
enough to run atleast 30MB files present in the directory. 

The input is a normal random words in file. Each Map is given a single file of size 30MB and
I am reading value as the content of the whole file. And running normal word count.

If I increase the mapred.child.java.opts size to higher value the applications runs successfully.
But it would be great if anyone can suggest me why mapred.child.java.opts  which is currently
200Mb default for task is not sufficient for 30 MB file, as this means Hadoop MapReduce is
consuming a lot of heap size and out of 200MB it doesn't even use 30Mb to process the task?
Also, is there any other way to read the a large Whole file as a input to a single Map, meaning
every Map gets a whole file to process?

-Shubh 
Mime
View raw message