lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki>
Subject Re: index: how to store binary data or objects ?
Date Tue, 10 Feb 2004 08:32:41 GMT
Dror Matalon wrote:

> On Tue, Feb 10, 2004 at 03:59:50AM +0100, wrote:
>>Hi Lucent Users!
>>Searching the documentation, API and this mailinglist results in:
>>"no way to store objects or binary data in an UnIndexed
>>org.apache.lucene.document.Field to attach it to the index directly"
>>Is there a way to do this? What would you suggest to do?
> 1. Store the binary data in files and store the path in Lucene. There's
> scallability issues here when you handle more than a few hundred
> thousand objects.

Just a comment: for ext2fs and BSD FFS (dunno about NT) scalability 
issues with this approach can be partially addressed by building a tree 
of subdirectories, instead of using just one. I.e. a file named 
"myThesis.pdf" would go into /m/y/t/myThesis.pdf. This way the time 
needed to list the files in a given directory is reduced (both unixes 
can already cache the inode numbers for name/inode lookup, so there is 
no significant time increase to lookup a longer path).

FreeBSD also has a special kind of filesystem, which uses inodes in a 
flat space (no directories). It was specifically designed for storing 
large numbers of files efficiently. Recent versions of Java on FreeBSD 
(1.4.2) seem to be very stable and performing well, so that could also 
be an option.

After all, a filesystem _is_ a kind of very specialized database... ;-)

Best regards,
Andrzej Bialecki

Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
FreeBSD developer (

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message