Return-Path: Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 72141 invoked from network); 11 Oct 2001 18:21:57 -0000 Received: from unknown (HELO mta.12.com) (65.198.8.41) by daedalus.apache.org with SMTP; 11 Oct 2001 18:21:57 -0000 Received: (qmail 27398 invoked from network); 11 Oct 2001 18:19:21 -0000 Received: from unknown (HELO riker.grandcentral.com) (10.102.15.55) by mta.12.com with SMTP; 11 Oct 2001 18:19:21 -0000 Received: by mail.grandcentral.com with Internet Mail Service (5.5.2653.19) id <42Y1HSH9>; Thu, 11 Oct 2001 11:11:24 -0700 Message-ID: <4BC270C6AB8AD411AD0B00B0D0493DF0EE7C68@mail.grandcentral.com> From: Doug Cutting To: "'lucene-dev@jakarta.apache.org'" Subject: RE: multithreading in SegmentsReader Date: Thu, 11 Oct 2001 11:11:18 -0700 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N > From: Dmitry Serebrennikov [mailto:dmitrys@earthlink.net] > > But I was looking again at the MultiSearcher after reading > through the SegmentsReader (and friends) and I was > thinking if it wouldn't be better to write MultiSearcher > not in terms of searching over multiple Searchers, but as > an IndexReader that merges segments from more than one > directory. A lot of the issues that MultiSearcher has to > solve are also solved in the SegmentsReader, but slightly > differently. Also, MultiSearcher has to re-implement the > methods of Searcher (like the low level search API that > was added recently). Yes, there is some duplication between MultiSearcher and SegmentsReader. The reason for keeping these separate was to support distributed searching. Thus the Searcher API is designed to have only small bits of data pass through it. I never actually implemented distributed searching, so this design is somewhat half baked. The general idea is that query terms must be passed to the searcher first to weight the query, then, once the query is weighted, it can be sent to a set of searchers in parallel. To implement this, we would need to do something like: 1. Move the abstract Searcher methods to an interface: public interface Searchable { int docFreq(Term term) throws IOException; int maxDoc() throws IOException; TopDocs search(Query query, Filter filter, int n) throws IOException; Document doc(int i) throws IOException; } 2. Implement a RemoteSearcher using RMI. 3. Change MultiSearcher.search() to search each sub-index in a separate thread. The low-level search API doesn't really fit in here too well. Note that, except for the search() method, the Searchable interface is a subset of IndexReader, so it still might make sense to somehow combine the notions of Searcher and IndexReader. But we should keep distributed searching in mind when this is done. If you are interested in drafting such a re-design, I'd love to see it. Doug