hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shubh hadoopExp <shubhhadoop...@gmail.com>
Subject Regarding WholeInputFileFormat Java Heap Size error
Date Mon, 16 May 2016 20:55:33 GMT
>> Hi All,
>> While reading input from directory recursively consisting of files of size 30Mb,
using WholeFileInputFormat and WholeFileRecordReader, I am running into JavaHeapSize error
for even a very small file of 30MB. By default the mapred.child.java.opts is set to -Xmx200m
and should be sufficient enough to run atleast 30MB files present in the directory. 
>> The input is a normal random words in file. Each Map is given a single file of size
30MB and I am reading value as the content of the whole file. And running normal word count.
>> If I increase the mapred.child.java.opts size to higher value the applications runs
successfully. But it would be great if anyone can suggest me why mapred.child.java.opts  which
is currently 200Mb default for task is not sufficient for 30 MB file, as this means Hadoop
MapReduce is consuming a lot of heap size and out of 200MB it doesn't even use 30Mb to process
the task? Also, is there any other way to read the a large Whole file as a input to a single
Map, meaning every Map gets a whole file to process?

     If anyone used any MR job to read multiple files from a directory please let me know
if they encounter the same issue and what configuration changes they used. 

>> -Shubh 

View raw message