hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: How much RAMs needed...
Date Mon, 16 Jul 2007 03:22:12 GMT

HDFS can't really do the combination into larger files, but if you can do
that, it will help quite a bit.

You might need a custom InputFormat or split to make it all sing, but you
should be much better off with fewer large input files.

One of the biggest advantages will be that your disk reading will be much
more linear with much less seeking.

On 7/15/07 6:26 PM, "Nguyen Kien Trung" <trung.n.k@gmail.com> wrote:

> Thanks Ted,
> Unfortunately, those files are really tiny files. Is it a good practice
> if HDFS can combine those tiny files into a single block which fits a
> standard size of 64M?
> Ted Dunning wrote:
>> Are these really tiny files, or are you really storing 2M x 100MB = 200TB of
>> data? Or do you have more like 2M x 10KB = 20GB of data?
>> Map-reduce and HDFS will generally work much better if you can arrange to
>> have relatively larger files.
>> On 7/15/07 8:04 AM, "erolagnab" <trung.n.k@gmail.com> wrote:
>>> I have a HDFS with 2 datanodes and 1 namenode in 3 different machines, 2G
>>> ram
>>> each.
>>> Datanode A contains around 700,000 blocks and Datanode B contains 1,200,000+
>>> blocks, the namenode fails to start due to out of memory when trying to add
>>> Datanode B into its rack. I have adjusted the java heap memory to 1600MB
>>> which is the maxinum. But it still runs out of memory.
>>> AFAIK, namenode loads all blocks information into the memory. If so, then is
>>> there anyway to estimate how much ram needed for a HDFS with given number of
>>> blocks in each datanode?

View raw message