lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Querna <>
Subject Performanmce of MultiSearcher?
Date Sun, 27 Mar 2005 00:07:18 GMT

I am working on using Lucence based indexes for the ASF's mod_mbox. 
Current versions of mod_mbox support MIME, and I am trying to add full 
text searching. (Then we can completely remove Eyebrowse)

Currently I am hacking around with the C++ (CLucence) Implementation, 
but I intend to migrate to Lucence4c shortly.

I was structuring one Lucence Index per-mailing list.  To search All 
mailing lists, I was planning on using a MultiSearcher.

Currently, the ASF public mail archives use about 17 Gigs, uncompressed, 
in the raw mbox format.

There are also about ~300 mailing lists in the public archives.

Can a multi-searcher quickly search 300 different indexes?  I am 
thinking that it will not.  300 separate indexes is lots of files to 
scan, even if Lucence is fast.  Any experience from other users would be 

Would it better to have a Single Main Index, for all of the lists, and 
include the List Names as a keyed field?

I suspect most searches would be restricted to one or two lists, but I 
would like good performance if I wanted to search all of the ASF lists.

Ideas/Comments?  Anyone willing to help me write some C :) ?



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message