hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghu Angadi <rang...@yahoo-inc.com>
Subject Re: HDFS - millions of files in one directory?
Date Fri, 23 Jan 2009 23:16:25 GMT
Raghu Angadi wrote:
> If you are adding and deleting files in the directory, you might notice 
> CPU penalty (for many loads, higher CPU on NN is not an issue). This is 
> mainly because HDFS does a binary search on files in a directory each 
> time it inserts a new file.

I should add that equal or even bigger cost is the memmove that 
ArrayList does when you add or delete entries.

ArrayList, rather than a map is used mainly to save memory, them most 
precious resource for NameNode.


> If the directory is relatively idle, then there is no penalty.
> Raghu.
> Mark Kerzner wrote:
>> Hi,
>> there is a performance penalty in Windows (pardon the expression) if 
>> you put
>> too many files in the same directory. The OS becomes very slow, stops 
>> seeing
>> them, and lies about their status to my Java requests. I do not know 
>> if this
>> is also a problem in Linux, but in HDFS - do I need to balance a 
>> directory
>> tree if I want to store millions of files, or can I put them all in 
>> the same
>> directory?
>> Thank you,
>> Mark

View raw message