Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm
Message-ID: <4BC270C6AB8AD411AD0B00B0D0493DF0EE7C68@mail.grandcentral.com>
From: Doug Cutting <DCutting@grandcentral.com>
To: "'lucene-dev@jakarta.apache.org'" <lucene-dev@jakarta.apache.org>
Subject: RE: multithreading in SegmentsReader
Date: Thu, 11 Oct 2001 11:11:18 -0700
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"

> From: Dmitry Serebrennikov [mailto:dmitrys@earthlink.net]
> 
> But I was looking again at the MultiSearcher after reading
> through the SegmentsReader (and friends) and I was
> thinking if it wouldn't be better to write MultiSearcher
> not in terms of searching over multiple Searchers, but as
> an IndexReader that merges segments from more than one
> directory. A lot of the issues that MultiSearcher has to
> solve are also solved in the SegmentsReader, but slightly
> differently. Also, MultiSearcher has to re-implement the
> methods of Searcher (like the low level search API that
> was added recently).

Yes, there is some duplication between MultiSearcher and SegmentsReader.
The reason for keeping these separate was to support distributed searching.
Thus the Searcher API is designed to have only small bits of data pass
through it.  I never actually implemented distributed searching, so this
design is somewhat half baked.

The general idea is that query terms must be passed to the searcher first to
weight the query, then, once the query is weighted, it can be sent to a set
of searchers in parallel.

To implement this, we would need to do something like:

1. Move the abstract Searcher methods to an interface:
  public interface Searchable {
    int docFreq(Term term) throws IOException;
    int maxDoc() throws IOException;
    TopDocs search(Query query, Filter filter, int n) throws IOException;
    Document doc(int i) throws IOException;
  }

2. Implement a RemoteSearcher using RMI.

3. Change MultiSearcher.search() to search each sub-index in a separate
thread.

The low-level search API doesn't really fit in here too well.

Note that, except for the search() method, the Searchable interface is a
subset of IndexReader, so it still might make sense to somehow combine the
notions of Searcher and IndexReader.  But we should keep distributed
searching in mind when this is done.  If you are interested in drafting such
a re-design, I'd love to see it.

Doug