lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Serebrennikov <dmit...@earthlink.net>
Subject Re: Notes on distributed searching with Lucene
Date Mon, 25 Mar 2002 22:32:45 GMT
Doug Cutting wrote:

>>From: Scott Ganyo [mailto:scott.ganyo@eTapestry.com]
>>
>>But this:
>>
>>Document[] getDocs(int[] i) throws IOException;
>>
>>still retrieves full documents from the remote index.
>>
>
>In my thinking, this would only be called for documents that are explicitly
>requested with Hits.doc().  I was not thinking that distributed search would
>support the "low-level" interface, but just the Hits interface.  For each
>search, two calls would be made per remote index, one to get query term
>statistics, and one to get the top-scoring document numbers and scores.
>These can be merged, and then only the globally top-scoring document objects
>need be retrieved, as they are displayed.
>
I think Scott's point was that retrieving documents is still too much 
work and perhaps only a few fields need be retrieved. For example, if 
one wanted to present a search results page with titles and summaries 
that's all one would need, whereas documents might also contain the full 
text of the document or other stored fields for other types of processing.

Another point is that some hit collectors choose to retrieve documents 
during scoring, however expensive that may be, in order to do some 
custom scoring or sorting or whatever. In this case, it would also help 
if such collectors could be "shipped" over to where the index resides 
and do their job there, so that at least they don't have to move the 
documents acorss the wire.

Dmitry



--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message