lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Ganyo <scott.ga...@eTapestry.com>
Subject RE: Notes on distributed searching with Lucene
Date Mon, 25 Mar 2002 22:12:57 GMT
But this:

Document[] getDocs(int[] i) throws IOException;

still retrieves full documents from the remote index.  One thing that I had
started to look at with remote indexes is an interface that looked like
this:

public IndexHitCollector search(Query query, Filter filter,
IndexHitCollector collector) throws IOException;

where IndexHitCollector looks like this:

public abstract class IndexHitCollector
    extends HitCollector
    implements Serializable
{
    protected transient Searcher m_searcher;
    
    public void setSearcher(Searcher searcher)
    {
        m_searcher = searcher;
    }
    
    abstract public void collect(int doc, float score);
}

This allows one to return less than full Documents (ie. just fields or
whatever) from a remote query as well as perhaps do other ranking,
filtering, and gathering within the collector.  This interface was for a
single remote index, of course, but the basic idea being that in a remote
scenario it is best to move control as close to the data as possible to
avoid as many remote calls and transmission of excess data as possible,
don't you agree?

Scott

P.S. Good job, Mark!

> -----Original Message-----
> From: Doug Cutting [mailto:DCutting@grandcentral.com]
> Sent: Monday, March 25, 2002 4:25 PM
> To: 'Lucene Developers List'
> Subject: RE: Notes on distributed searching with Lucene
> 
> 
> > From: Mark Harwood [mailto:markharwood@totalise.co.uk]
> > 
> > I have written up some of my experiences with creating a 
> > distributed system 
> > with Lucene here:
> > 
> > http://home.clara.net/markharwood/lucene/
> > 
> > It includes some UML interaction diagrams that I found useful 
> > in understanding 
> > the Lucene codebase.
> 
> Mark,
> 
> It's great to see someone experimenting with this.  I originally had
> distributed searching in mind when I wrote Lucene, but never 
> quite got to
> adding it.  A message that mentions some of these intentions is at:
>   
> http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg0
0252.html

A less "chatty" interface than the one mentioned there might be:

  public interface Searchable {
    public class TermStatistics implements Serializable {
      public int[] docFreqs;
      public int maxDoc;
    }
    int getTermStatistics(Term[] terms) throws IOException;
    TopDocs search(Query query, Filter filter, int n) throws IOException;
    Document[] getDocs(int[] i) throws IOException;
  }

With these three phases (collect term statistics, get doc id scores, get
docs) the results should be identical to searching the indexes locally with
MultiSearcher.  It sounded like your experiments skipped the first phase.

Probably it would be worth writing a MultiThreadSearcher that spawns a
thread for each sub-search, then waits for all to finish before merging the
results.

So, if you are able to work on this more, it would be great to figure out
what it would take to make Query serializable, to convert the Searcher
implementations to use the above interface in place of the existing similar
abstract methods, and finally to implement an RMI-based RemoteSearcher.

Doug

--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message