hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Shvachko <...@yahoo-inc.com>
Subject Re: Max. Possible No. of Files
Date Fri, 05 Jun 2009 17:42:50 GMT
There are some name-node memory estimates in this jira.
http://issues.apache.org/jira/browse/HADOOP-1687

With 16 GB you can normally have 60 million objects (files
+ blocks) on the name-node. The number of files would depend
on the file to block ratio.

--Konstantin


Brian Bockelman wrote:
> 
> On Jun 5, 2009, at 11:51 AM, Wasim Bari wrote:
> 
>> Hi,
>>     Does someone has some data regarding maximum possible number of 
>> files over HDFS ?
>>
> 
> Hey Wasim,
> 
> I don't think that there is a maximum limit.  Remember:
> 1) Less is better.  HDFS is optimized for big files.
> 2) The amount of memory the HDFS namenode needs is a function of the 
> number of files.  If you have a huge number of files, you get a huge 
> memory requirement.
> 
> 1-2 million files is fairly safe if you have a normal-looking namenode 
> server (8-16GB RAM).  I know some of our UCSD colleagues just ran a test 
> where they were able to put more than .5M files in a single directory 
> and still have a useable file system.
> 
> Brian
> 
>> my second question is, I created small files with small block size up 
>> to one lac and read the files from HDFS, reading performance remains 
>> almost unaffected with increasing number of files.
>>
>> The possible reasons I could think are:
>>
>> 1  . One lac isn't a big number to disturb HDFS performance (I used 1 
>> namenode and 4 data nodes)
>>
>> 2.  As reading is done directly from datanode with first time 
>> interaction with namenode, so reading from different nodes doesn't 
>> affect the performance.
>>
>>
>> If someone could add or negate some information it will be highly 
>> appreciated.
>>
>> Cheers,
>> Wasim
> 
> 

Mime
View raw message