lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Multiple collections
Date Thu, 23 Dec 2004 19:42:45 GMT
On Dec 23, 2004, at 2:18 PM, Jim Lynch wrote:
> I'm investigating search engines and have started to look at Lucene.  
> I have a couple of questions, however.  The faq seems to indicate we 
> can't do searches and indexing at the same time.

Where in the FAQ does it indicate this?  This is incorrect.  And I 
don't think this has ever been the case for Lucene.  Indexing and 
searching can most definitely occur at the same time.

> We have currently about 4 million documents comprised of  about 16 
> million terms.  This is currently broken up into about 50 different 
> collections which are separate "databases".  Some of these collections 
> are producted by a web crawler, some are produced by indexing a static 
> file tree and some are produced via a feed from another system, which 
> either adds new documents to a collection or replaces a document.  
> There are really 2 questions.  Is this too much data for Lucene?

It is not too much data for Lucene.  Your architecture around Lucene is 
the more important aspect.

>   And is there a way to keep separate collections (probably indexes) 
> and search all (usually just a subset) of them at once?  I see the 
> MultiSearcher object that may be the ticket, but IMHO javadocs leave a 
> lot to be desired in the way of documentation.  They seem to 
> completely leave out the "glue" and examples.

MultiSearcher is pretty trivial to use.  There is an example in Lucene 
in Action's source code ("ant SearchServer") and I'm using a 
MultiSearcher for the upcoming lucenebook.com site like this:

     Searchable[] searchables = new Searchable[indexes.length];

     for (int i = 0; i < indexes.length; i++) {
       searchables[i] = new IndexSearcher(indexes[i]);
     }

     searcher = new MultiSearcher(searchables);

Use MultiSearcher in the same manner as you would IndexSearcher.  You 
can also find out which index a particular hit was from using the 
subSearcher method.

As for your comment about the javadocs, allow me to refer you to 
Lucene's test suite.  TestMultiSearcher.java in this case.  This is the 
best "documentation" there is!  (besides Lucene in Action, of course :)

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message