lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bib_lucene bib <bib_luc...@yahoo.com>
Subject Re: No.of Files in Directory
Date Thu, 30 Jun 2005 06:29:51 GMT
Thanks Jian
 
I need to retrive the original document sometimes. I did not quite understand your second
suggestion.
Can you please help me understand better, a pointer to some web resource will also help.

jian chen <chenjian1227@gmail.com> wrote:
Hi,

Depending on the operating system, there might be a hard limit on the
number of files in one directory (windoze versions). Even with
operating systems that don't have a hard limit, it is still better not
to put too many files in one directory (linux).

Typically, the file system won't be very efficient in terms of file
retrieval if there are nore than couple thousand files in one
directory.

There are some ways to tackle this issue.

1) Use a hash function to distribute the files to different sub
directories based on the file name. For example, use the MD5 algorithm
in Java or CRC algorithm in java, hash the file name to a number, use
this number to construct directory. For example, if the number you
hashed is 123456, then, you can make 123 as a sub-dir name, and 456 as
the sub-sub dir name, so forth.

I think the SQUID web proxy server uses this approach to do the file caching.

2) Why not use Lucene's indexing algorithm and store binary files with
lucene index?! I love the indexing algorithm, in that, you don't need
to manage the free space like that in a typical file system. Because
the merge process will take care of reclaiming the free space
automatically.

Should these two advices be good?

Jian

On 6/29/05, bib_lucene bib wrote:
> Hi All
> 
> In my webapp i have people uploading their documents. My server is windows/tomcat. I
am thinking there will be a limit on the no of files in a directory. Typically apllication
users will load 3-5 page word docs.
> 
> 1. How does one design the system such that there will not be any problem as the users
keep uploading their files, even if a million files are reached.
> 2. Is there a sample application that does this.
> 3. Should I have lucene update index after each upload or should I do it like once a
day.
> 
> Thanks
> Bib
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message