lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Lynch <...@sgi.com>
Subject Re: Multiple collections
Date Thu, 23 Dec 2004 20:29:21 GMT
Hi, Erik,

I've been perusing the mail list today and see your name often.  As well 
as visiting the web site advertising your book.  If we decide to go this 
way, I'll be sure to pick up a copy.

The FAQ number 41 on page 
http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.search&toc=faq 
implies a problem with searching and indexing at the same time, unless 
I'm misunderstanding what it says.

So it is kosher to download the source code before buying the book?  I 
tend not to do that for a couple of reasons, it doesn't seem right and 
frequently authors go out of their way to make sure it's not very useful 
without the book.   Not that I consider that unfair, mind you.  It's 
just a common practice from my experience.

Any way thanks for the info. 

So what you are saying if I can read between the lines and extrapolate 
from what I've read, is that I can create an index for each of my 
collections as I see fit, putting them in separate directories and when 
I need to search I can select a subset of the directories with the 
MultiSearcher.  Since the user selects which collections he wants to 
search from via checkboxes, I can build a list of searchables to pass to 
MultiSearcher.  However, looking at the javadocs I see Searchable is an 
interface.  Hm, I'll have to look at some code to see how that works.

Thanks, you've given me something to chew on.

Jim.

At the risk of  being politically incorrect, Merry Christmas to you 
all.  Not that I care a whit about political correctness.  8)

Erik Hatcher wrote:

> On Dec 23, 2004, at 2:18 PM, Jim Lynch wrote:
>
>> I'm investigating search engines and have started to look at Lucene.  
>> I have a couple of questions, however.  The faq seems to indicate we 
>> can't do searches and indexing at the same time.
>
>
> Where in the FAQ does it indicate this?  This is incorrect.  And I 
> don't think this has ever been the case for Lucene.  Indexing and 
> searching can most definitely occur at the same time.
>
>> We have currently about 4 million documents comprised of  about 16 
>> million terms.  This is currently broken up into about 50 different 
>> collections which are separate "databases".  Some of these 
>> collections are producted by a web crawler, some are produced by 
>> indexing a static file tree and some are produced via a feed from 
>> another system, which either adds new documents to a collection or 
>> replaces a document.  There are really 2 questions.  Is this too much 
>> data for Lucene?
>
>
> It is not too much data for Lucene.  Your architecture around Lucene 
> is the more important aspect.
>
>>   And is there a way to keep separate collections (probably indexes) 
>> and search all (usually just a subset) of them at once?  I see the 
>> MultiSearcher object that may be the ticket, but IMHO javadocs leave 
>> a lot to be desired in the way of documentation.  They seem to 
>> completely leave out the "glue" and examples.
>
>
> MultiSearcher is pretty trivial to use.  There is an example in Lucene 
> in Action's source code ("ant SearchServer") and I'm using a 
> MultiSearcher for the upcoming lucenebook.com site like this:
>
>     Searchable[] searchables = new Searchable[indexes.length];
>
>     for (int i = 0; i < indexes.length; i++) {
>       searchables[i] = new IndexSearcher(indexes[i]);
>     }
>
>     searcher = new MultiSearcher(searchables);
>
> Use MultiSearcher in the same manner as you would IndexSearcher.  You 
> can also find out which index a particular hit was from using the 
> subSearcher method.
>
> As for your comment about the javadocs, allow me to refer you to 
> Lucene's test suite.  TestMultiSearcher.java in this case.  This is 
> the best "documentation" there is!  (besides Lucene in Action, of 
> course :)
>
>     Erik
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message