hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jason hadoop <jason.had...@gmail.com>
Subject Re: HDFS - millions of files in one directory?
Date Mon, 26 Jan 2009 02:06:28 GMT
With large numbers of files you run the risk of the Datanodes timing out
when they are performing their block report and or DU reports.
Basically if a *find* in the dfs.data.dir takes more than 10 minutes you
will have catastrophic problems with your hdfs.
At attributor with 2million blocks on a datanode, under XFS centos (i686)
5.1 stock kernels would take 21 minutes with noatime, on a 6 disk raid 5
array. 8way 2.5ghz xeons 8gig ram. Raid controller was a PERC and the
machine basically served hdfs.


On Sun, Jan 25, 2009 at 1:49 PM, Mark Kerzner <markkerzner@gmail.com> wrote:

> Yes, flip suggested such solution, but his files are text, so he could
> combine them all in a large text file, with each lined representing initial
> files. My files, however, are binary, so I do not see how I could combine
> them.
>
> However, since my numbers are limited by about 1 billion files total, I
> should be OK to put them all in a few directories with under, say, 10,000
> files each. Maybe a little balanced tree, but 3-4 four levels should
> suffice.
>
> Thank you,
> Mark
>
> On Sun, Jan 25, 2009 at 11:43 AM, Carfield Yim <carfield@carfield.com.hk
> >wrote:
>
> > Possible simple having a file large in size instead of having a lot of
> > small files?
> >
> > On Sat, Jan 24, 2009 at 7:03 AM, Mark Kerzner <markkerzner@gmail.com>
> > wrote:
> > >
> > > Hi,
> > >
> > > there is a performance penalty in Windows (pardon the expression) if
> you
> > put
> > > too many files in the same directory. The OS becomes very slow, stops
> > seeing
> > > them, and lies about their status to my Java requests. I do not know if
> > this
> > > is also a problem in Linux, but in HDFS - do I need to balance a
> > directory
> > > tree if I want to store millions of files, or can I put them all in the
> > same
> > > directory?
> > >
> > > Thank you,
> > > Mark
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message