hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohamed Riadh Trad <Mohamed.t...@inria.fr>
Subject MapReduce and Heap Size
Date Tue, 11 May 2010 14:37:38 GMT

Hi,

I am running hadoop over 100 million files from a local file system.

Each split contains a subset of the file collection. The recordReader provides a file as a
record.

The problem is that I get java.lang.outofmemoryerror: Java heap space.

I tried to increase the heap size to 16Gb, enougth to handle 20 millions files.

Up to 100 millions, I get the same problem and the job fails. Moreover, the job loading time
exceeds 40 minutes while handling 20million files.

Any suggestions in order to avoid this delay and the Heap Space releated problems?

Regards,

Le 10 mai 2010 à 20:39, Marcin Sieniek a écrit :

> Hi Jyothish,
> 
> I had exactly the same problem and I solved it. To answer your question: as for me, HDFS
and NFS are totally incompatible;) However, you may configure MadReduce to run on NFS only,
without HDFS. See the last but one post here:
> http://old.nabble.com/Hadoop-over-Lustre--td19092864.html
> I did it and it works very well for NFS too (note that old hadoop-site.xml was splited
to core-site.xml, mapred-site.xml and hdfs-site.xml in newer releases). Let me know if you
have any problems with this configuration.
> 
> Marcin
> 
> Le 2010-05-10 20:16, alex kamil a écrit :
>> 
>> Jyothish, 
>> 
>> as far as i know it is not recommended to run Hadoop on NFS, you suppose to use use
local volumes for all mapred and dfs directories
>> 
>> Alex
>> 
>> On Mon, May 10, 2010 at 2:00 PM, Jyothish Soman <jyothish.soman@gmail.com>
wrote:
>> I have a distributed system on NFS, and wanted to use MapReduce on it, but the system
keeps spawning errors related to inability to allocate temporary space. 
>> Though sufficient is available, hence my question. 
>>  Is HDFS and NFS compatible?.
>> 
> 


Mime
View raw message