httpd-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Richardson" <james.richard...@db.com>
Subject RE: [users@httpd] serving millions of files
Date Fri, 04 Feb 2005 14:50:51 GMT

In my experience ( I have done this before ), you really don't want 10's
of thousands of files per directory. The reason is that directories are
composed of limited size blocks, and to make huge directories requires
many of these blocks chained together, and thus makes searching for files
really slow. (By searching, I mean looking for the file to actually open
it). This is because directory entries are stored unordered in directory
blocks (even true for reiser4).

There are some physical limitations to the numbers of files on disk, I
can't remember them offhand.

So, then, best thing is to store a wide flat tree, using only a few
directories at each level, then a bunch of files (10 or so) at the bottom.

Assuming that each file has a number, you can then create some algorithm
to map the file number to a directory path.

e.g. 1 = /a/a/a/a/a/a/a/1
     2 = /a/a/a/a/a/a/b/2
    27 = /a/a/a/a/a/a/a/27

Or something like that. The depth of the tree gives you the number of
files that it will store. Assuming 10 files per directory, and seven
levels, you get about 80 billion files or something. ( Would the
filesystem support this many files, I don't know ). 

If you are having trouble with just too many files per filesystem (I think
realistically this would be a LOT), then you simply mount have a
"linkfarm" from the top level directory to whatever partition the files
live on.

A (not particularly well written I'm afraid) bit of code to do something
like this is:

    final static char[] digits = {
        'A' , 'B' , 'C' , 'D' , 'E' , 'F' ,
        'G' , 'H' , 'I' , 'J' , 'K' , 'L' ,
        'M' , 'N' , 'O' , 'P' , 'Q' , 'R' ,
        'S' , 'T' , 'U' , 'V' , 'W' , 'X' ,
        'Y' , 'Z' , 'a' , 'b' , 'c' , 'd' ,
        'e' , 'f' , 'g' };

    public static String makeId(int i) {

	i += 65536;

	int radix = 16;
	
	StringBuffer sb = new StringBuffer();

	i = i / radix;

        while (i <= -radix) {
            sb.append ( digits[-(i % radix)] );
	    sb.append ( '/' );
            i = i / radix;
        }
	
        sb.append ( digits[-i] );
	
        return sb.toString();
    }

Hope that helps,

Best Regards,

James





> -----Original Message-----
> From: jeremy-list@adtcs.com [mailto:jeremy-list@adtcs.com]
> Sent: 03 February 2005 19:40
> To: users@httpd.apache.org
> Subject: Re: [users@httpd] serving millions of files
> 
> On 2/3/05 2:32 PM, "John Bohumil" <jbohumil@mn-exch.tcfbank.com> wrote:
> 
> > I'm looking at a possible solution to a requirement for an intranet
> > application that will serve as many as 15 million 3k to 10K PDF files.
> > (Another scenario has these as 300K to 800K) I want to serve them over
> Apache
> > running on Linux.
> >
> > These can be grouped and named any way we want so long as we can
> generate a
> > unique URL.  Approximately 60,000 per day will be added with a like
> manner
> > aged off (probably monthly aging).  I am thinking of a directory
> structure
> > that looks like this:
> >
> > httpd_docs
> > yyyy
> > mm
> >   dd
> >     file00001.pdf
> >     file00002.pdf
> >     ... (about 60,000 of these tops per directory)
> >     file60000.pdf
> >   dd
> >     file00001.pdf
> >     file00002.pdf
> >     ... (about 60,000 of these tops per directory)
> >     file60000.pdf
> >   dd
> >     file00001.pdf
> >     file00002.pdf
> >     ... (about 60,000 of these tops per directory)
> >     file60000.pdf
> >
> > etc...
> >
> > It would also be possible to go so far as to put every thing in one
> directory
> > and use unique file names such as
> >
> > yyyymmdd_file00001.pdf
> >
> > These files will be accessed by a process that will generate the URL
> directly
> > using a simple algorhythm.  We want response time to be good.
> >
> > Also we'll be aging these by month, so we need to do something like an
> RM -R
> > on directories.
> >
> > I've never seen a file system this large with so many files per
> directory.  Is
> > this scheme feasible or even reasonable?
> > Should I consider one Linux file system over another?  Is there good
> reason to
> > split these up into smaller directories vs. everything in one huge
> directory?
> >
> > Thanks for any feedback!
> > John
> >
> > ---------------------------------------------------------------------
> > The official User-To-User support forum of the Apache HTTP Server
> Project.
> > See <URL:http://httpd.apache.org/userslist.html> for more info.
> > To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
> >  "   from the digest: users-digest-unsubscribe@httpd.apache.org
> > For additional commands, e-mail: users-help@httpd.apache.org
> >
> 
> I can tell you that with 60,000 files in a directory, rm -r will fail. I
> would suggest a more granular directory structure within your day
> directory.
> 
> Jeremy
> 
> 
> ---------------------------------------------------------------------
> The official User-To-User support forum of the Apache HTTP Server
Project.
> See <URL:http://httpd.apache.org/userslist.html> for more info.
> To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
>    "   from the digest: users-digest-unsubscribe@httpd.apache.org
> For additional commands, e-mail: users-help@httpd.apache.org


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Mime
View raw message