hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nguyen Kien Trung <trung....@gmail.com>
Subject Re: How much RAMs needed...
Date Tue, 17 Jul 2007 03:13:13 GMT
Thanks Peter and Ted, your explanations do make some sense to me.

The out of memory error is as follows:
        at sun.misc.Unsafe.allocateMemory(Native Method)
        at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:99)
        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
        at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:57)
        at sun.nio.ch.IOUtil.read(IOUtil.java:205)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:207)
        at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:350)
        at org.apache.hadoop.ipc.Server$Listener.run(Server.java:267)
in which I think the problem is not due to block count.

I decided to give another try.
I formated a new namenode and wrote a program which has 100 threads 
writing individual files consistently in to HDFS with 2 datanodes (A and 
B) and replication factor is 2. In a first few hours, blocks are equally 
replicated correctly among 2 datanodes (i used browser to see number of 
blocks in 2 datanodes), but after 8 hours, HDFS started having strange 
1) Blocks were not equally replicated
2) Timed out for RPC (on datanodes)
3) Out of Memory Error (on namenode)

I'm going to look deeper into Hadoop code and see. Any other thoughts?

Ted Dunning wrote:
> Peter is pointing out that he was able to process the equivalent of many
> small files using very modest hardware (smaller than your hardware).
> This is confirmation that you need to combine your inputs into larger
> chunks.
> On 7/15/07 7:07 PM, "Nguyen Kien Trung" <trung.n.k@gmail.com> wrote:
>> Hi Peter,
>> I appreciate for the info. I'm afraid I'm not getting what you mean.
>> The issue I've encountered is i'm not able to start up the namenode due
>> to out of memory error. Given that there are huge number of tiny files
>> in datanodes.
>> Cheers,
>> Trung
>> Peter W. wrote:
>>> Trung,
>>> Using one machine (with 2GB RAM) and 300 input files
>>> I was able to successfully run:
>>> INFO mapred.JobClient:
>>> Map input records=10785802
>>> Map output records=10785802
>>> Map input bytes=1302175673
>>> Map output bytes=1101864522
>>> Reduce input groups=1244034
>>> Reduce input records=10785802
>>> Reduce output records=1050704
>>> Consolidating the files in your input
>>> directories might help.
>>> Peter W.
>>> On Jul 15, 2007, at 5:40 PM, Ted Dunning wrote:
>>>> Are these really tiny files, or are you really storing 2M x 100MB =
>>>> 200TB of
>>>> data? Or do you have more like 2M x 10KB = 20GB of data?
>>>> Map-reduce and HDFS will generally work much better if you can
>>>> arrange to
>>>> have relatively larger files.
>>>> On 7/15/07 8:04 AM, "erolagnab" <trung.n.k@gmail.com> wrote:
>>>>> I have a HDFS with 2 datanodes and 1 namenode in 3 different
>>>>> machines, 2G ram
>>>>> each.
>>>>> Datanode A contains around 700,000 blocks and Datanode B contains
>>>>> 1,200,000+
>>>>> blocks, the namenode fails to start due to out of memory when trying
>>>>> to add
>>>>> Datanode B into its rack. I have adjusted the java heap memory to
>>>>> 1600MB
>>>>> which is the maxinum. But it still runs out of memory.
>>>>> AFAIK, namenode loads all blocks information into the memory. If so,
>>>>> then is
>>>>> there anyway to estimate how much ram needed for a HDFS with given
>>>>> number of
>>>>> blocks in each datanode?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message