lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jian chen <>
Subject Re: No.of Files in Directory
Date Thu, 30 Jun 2005 04:38:42 GMT

Depending on the operating system, there might be a hard limit on the
number of files in one directory (windoze versions). Even with
operating systems that don't have a hard limit, it is still better not
to put too many files  in one directory (linux).

Typically, the file system won't be very efficient in terms of file
retrieval if there are nore than couple thousand files in one

There are some ways to tackle this issue.

1) Use a hash function to distribute the files to different sub
directories based on the file name. For example, use the MD5 algorithm
in Java or CRC algorithm in java, hash the file name to a number, use
this number to construct directory. For example, if the number you
hashed is 123456, then, you can make 123 as a sub-dir name, and 456 as
the sub-sub dir name, so forth.

I think the SQUID web proxy server uses this approach to do the file caching.

2) Why not use Lucene's indexing algorithm and store binary files with
lucene index?! I love the indexing algorithm, in that, you don't need
to manage the free space like that in a typical file system. Because
the merge process will take care of reclaiming the free space

Should these two advices be good?


On 6/29/05, bib_lucene bib <> wrote:
> Hi All
> In my webapp i have people uploading their documents. My server is windows/tomcat. I
am thinking there will be a limit on the no of files in a directory. Typically apllication
users will load 3-5 page word docs.
> 1. How does one design the system such that there will not be any problem as the users
keep uploading their files, even if a million files are reached.
> 2. Is there a sample application that does this.
> 3. Should I have lucene update index after each upload or should I do it like once a
> Thanks
> Bib
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message