Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 21848 invoked from network); 23 Dec 2004 21:18:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 23 Dec 2004 21:18:57 -0000 Received: (qmail 54127 invoked by uid 500); 23 Dec 2004 21:18:51 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 54098 invoked by uid 500); 23 Dec 2004 21:18:50 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 54085 invoked by uid 99); 23 Dec 2004 21:18:50 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from Unknown (HELO ehatchersolutions.com) (69.55.225.129) by apache.org (qpsmtpd/0.28) with ESMTP; Thu, 23 Dec 2004 13:18:47 -0800 Received: by ehatchersolutions.com (Postfix, from userid 504) id 34CB613E200B; Thu, 23 Dec 2004 16:18:44 -0500 (EST) Received: from [192.168.1.100] (va-chrvlle-cad1-bdgrp1-4b-b-169.chvlva.adelphia.net [68.169.41.169]) by ehatchersolutions.com (Postfix) with ESMTP id CD5C413E200A for ; Thu, 23 Dec 2004 16:18:38 -0500 (EST) Mime-Version: 1.0 (Apple Message framework v619) In-Reply-To: <41CB2AA1.3050001@sgi.com> References: <41CB1A17.7040302@sgi.com> <41CB2AA1.3050001@sgi.com> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <34F0C1BA-5528-11D9-A991-000393A564E6@ehatchersolutions.com> Content-Transfer-Encoding: 7bit From: Erik Hatcher Subject: Re: Multiple collections Date: Thu, 23 Dec 2004 16:18:35 -0500 To: "Lucene Users List" X-Mailer: Apple Mail (2.619) X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on javelina X-Spam-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00, RCVD_IN_NJABL_DUL,RCVD_IN_SORBS_DUL autolearn=no version=3.0.1 X-Spam-Level: X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N On Dec 23, 2004, at 3:29 PM, Jim Lynch wrote: > I've been perusing the mail list today and see your name often. I really should get out more often. > The FAQ number 41 on page > http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi? > file=chapter.search&toc=faq implies a problem with searching and > indexing at the same time, unless I'm misunderstanding what it says. Ah... the issue is that an IndexReader/IndexSearcher that was constructed *before* documents were added will not see the new documents. Searching still works successfully. After you add documents, to find them you must use a new instance of IndexSearcher/IndexReader. That FAQ is somewhat misleading I suppose. This FAQ will be deprecated in favor of the wiki, I hope: http://wiki.apache.org/jakarta-lucene/LuceneFAQ > So it is kosher to download the source code before buying the book? No objections from me. It's made freely available to help sell the book of course, but here's one well known secret about writing books, authors don't make (much) money on them. Basically Otis and I are completely insane. http://www.blogscene.org/erik/Writing/demons.txt > I tend not to do that for a couple of reasons, it doesn't seem right > and frequently authors go out of their way to make sure it's not very > useful without the book. Not that I consider that unfair, mind you. > It's just a common practice from my experience. I've not heard of anyone making the code less useful on purpose. After spending 14 months writing a book, though, it is hard to muster up the desire to polish off some source code. You've hacked examples throughout, and there is nothing coherent to give away other than 5-line snippets. The most difficult thing to do when writing a book is try to come up with some theme or application for all the code. For the Ant book it worked out nicely (a Lucene-based document/image search engine). For Lucene in Action, there are so many variations that need to be shown that a single application to cover all the cases would be far too contrived. I have a personal distaste for most of the code I see in books myself, and Otis and I worked hard to keep the examples relevant and useful. The examples are mostly JUnit test cases. When we broke the code we knew pretty quickly. I'm proud of the code, and also quite proud of the way I packaged it. I got to show off my Ant skillz to launch it all. Fire it up and enjoy (or at least report back any suggestions or problems you have). How useful the examples are without the book itself, though, is a tough one. It is hard to package up 421 pages (I have physical copies of LIA in my hands as we speak!) of meaningful discourse into some example code. It certainly isn't intentional to make the code less than useful, but there are certainly lots of explanations to go along with that code. If anyone finds the example code difficult to understand without the book, though, by all means let me know. I'd be happy to explain it here or post it to the blog I'll have live at lucenebook.com soon. > So what you are saying if I can read between the lines and extrapolate > from what I've read, is that I can create an index for each of my > collections as I see fit, putting them in separate directories and > when I need to search I can select a subset of the directories with > the MultiSearcher. Since the user selects which collections he wants > to search from via checkboxes, I can build a list of searchables to > pass to MultiSearcher. However, looking at the javadocs I see > Searchable is an interface. Hm, I'll have to look at some code to see > how that works. No need to extrapolate too much. IndexSearcher is an instanceof Searchable. Here's the code :) > >> Searchable[] searchables = new Searchable[indexes.length]; >> >> for (int i = 0; i < indexes.length; i++) { >> searchables[i] = new IndexSearcher(indexes[i]); >> } >> >> searcher = new MultiSearcher(searchables); Erik p.s. There is an issue with MultiSearcher and how it scores documents across multiple indexes that has recently been discussed. It is in Bugzilla issue tracking system and e-mail archives. I don't find it much of a problem (yet) as I've only just begun to use MultiSearcher in a production environment. --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org