lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: Multiple collections
Date Thu, 23 Dec 2004 21:18:35 GMT

On Dec 23, 2004, at 3:29 PM, Jim Lynch wrote:
> I've been perusing the mail list today and see your name often.

I really should get out more often.

> The FAQ number 41 on page  
> implies a problem with searching and  
> indexing at the same time, unless I'm misunderstanding what it says.

Ah... the issue is that an IndexReader/IndexSearcher that was  
constructed *before* documents were added will not see the new  
documents.  Searching still works successfully.  After you add  
documents, to find them you must use a new instance of  
IndexSearcher/IndexReader.  That FAQ is somewhat misleading I suppose.   
This FAQ will be deprecated in favor of the wiki, I hope:

> So it is kosher to download the source code before buying the book?

No objections from me.  It's made freely available to help sell the  
book of course, but here's one well known secret about writing books,  
authors don't make (much) money on them.  Basically Otis and I are  
completely insane.

>   I tend not to do that for a couple of reasons, it doesn't seem right  
> and frequently authors go out of their way to make sure it's not very  
> useful without the book.   Not that I consider that unfair, mind you.   
> It's just a common practice from my experience.

I've not heard of anyone making the code less useful on purpose.  After  
spending 14 months writing a book, though, it is hard to muster up the  
desire to polish off some source code.  You've hacked examples  
throughout, and there is nothing coherent to give away other than  
5-line snippets.  The most difficult thing to do when writing a book is  
try to come up with some theme or application for all the code.  For  
the Ant book it worked out nicely (a Lucene-based document/image search  
engine).  For Lucene in Action, there are so many variations that need  
to be shown that a single application to cover all the cases would be  
far too contrived.  I have a personal distaste for most of the code I  
see in books myself, and Otis and I worked hard to keep the examples  
relevant and useful.  The examples are mostly JUnit test cases.  When  
we broke the code we knew pretty quickly.  I'm proud of the code, and  
also quite proud of the way I packaged it.  I got to show off my Ant  
skillz to launch it all.  Fire it up and enjoy (or at least report back  
any suggestions or problems you have).

How useful the examples are without the book itself, though, is a tough  
one.  It is hard to package up 421 pages (I have physical copies of LIA  
in my hands as we speak!) of meaningful discourse into some example  
code.  It certainly isn't intentional to make the code less than  
useful, but there are certainly lots of explanations to go along with  
that code.

If anyone finds the example code difficult to understand without the  
book, though, by all means let me know.  I'd be happy to explain it  
here or post it to the blog I'll have live at soon.

> So what you are saying if I can read between the lines and extrapolate  
> from what I've read, is that I can create an index for each of my  
> collections as I see fit, putting them in separate directories and  
> when I need to search I can select a subset of the directories with  
> the MultiSearcher.  Since the user selects which collections he wants  
> to search from via checkboxes, I can build a list of searchables to  
> pass to MultiSearcher.  However, looking at the javadocs I see  
> Searchable is an interface.  Hm, I'll have to look at some code to see  
> how that works.

No need to extrapolate too much.  IndexSearcher is an instanceof  
Searchable.  Here's the code :)

>>     Searchable[] searchables = new Searchable[indexes.length];
>>     for (int i = 0; i < indexes.length; i++) {
>>       searchables[i] = new IndexSearcher(indexes[i]);
>>     }
>>     searcher = new MultiSearcher(searchables);


p.s. There is an issue with MultiSearcher and how it scores documents  
across multiple indexes that has recently been discussed.  It is in  
Bugzilla issue tracking system and e-mail archives.  I don't find it  
much of a problem (yet) as I've only just begun to use MultiSearcher in  
a production environment.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message