hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stu24m...@yahoo.com
Subject Re: Maximum number of files in directory? (in hdfs)
Date Wed, 18 Aug 2010 02:02:12 GMT
I'll go with keeping my sanity then.

The files will all be >= 64MB

Take care,
-----Original Message-----
From: Allen Wittenauer <awittenauer@linkedin.com>
Date: Wed, 18 Aug 2010 01:00:42 
To: <hdfs-user@hadoop.apache.org><hdfs-user@hadoop.apache.org>
Reply-To: hdfs-user@hadoop.apache.org
Subject: Re: Maximum number of files in directory? (in hdfs)

On Aug 17, 2010, at 5:44 PM, Stuart Smith wrote:
> I started to break the files into subdirectories out of habit (from working on ntfs/etc),
but it occurred to me that maybe (from a performance perspective), it doesn't really matter
on hdfs.
> Does it? Is there some recommended limit on the number of files to store in one directory
on hdfs? I'm thinking thousands to millions, so we're not talking about INT_MAX or anything,
but a lot.
> Or is it only limited by my sanity :) ?

We have a directory with several thousand files in it.

It is always a pain when we hit it because the client heap size needs to be increased to do
anything in it:  directory listings, web uis, distcp, etc, etc, etc.  Doing any sort of manipulation
in that dir is also slower.

My recommendation: don't do it.  Directories, AFAIK, are relatively cheap resource wise vs.
lots of files in one.

[Hopefully these files are large.  Otherwise they should be joined together... if not, you're
going to take a performance hit processing them *and* storing them...]
View raw message