Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@apache.org Received: (qmail 41553 invoked from network); 25 Mar 2002 22:29:59 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 25 Mar 2002 22:29:59 -0000 Received: (qmail 13117 invoked by uid 97); 25 Mar 2002 22:30:03 -0000 Delivered-To: qmlist-jakarta-archive-lucene-dev@jakarta.apache.org Received: (qmail 13101 invoked by uid 97); 25 Mar 2002 22:30:03 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 13090 invoked from network); 25 Mar 2002 22:30:02 -0000 Message-ID: <3C9FA58D.90201@earthlink.net> Date: Mon, 25 Mar 2002 15:32:45 -0700 From: Dmitry Serebrennikov User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.7) Gecko/20011221 X-Accept-Language: en-us MIME-Version: 1.0 To: Lucene Developers List Subject: Re: Notes on distributed searching with Lucene References: <94F890AC98E9AF478F08FEFAC7467C7C01101C@riker01> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Doug Cutting wrote: >>From: Scott Ganyo [mailto:scott.ganyo@eTapestry.com] >> >>But this: >> >>Document[] getDocs(int[] i) throws IOException; >> >>still retrieves full documents from the remote index. >> > >In my thinking, this would only be called for documents that are explicitly >requested with Hits.doc(). I was not thinking that distributed search would >support the "low-level" interface, but just the Hits interface. For each >search, two calls would be made per remote index, one to get query term >statistics, and one to get the top-scoring document numbers and scores. >These can be merged, and then only the globally top-scoring document objects >need be retrieved, as they are displayed. > I think Scott's point was that retrieving documents is still too much work and perhaps only a few fields need be retrieved. For example, if one wanted to present a search results page with titles and summaries that's all one would need, whereas documents might also contain the full text of the document or other stored fields for other types of processing. Another point is that some hit collectors choose to retrieve documents during scoring, however expensive that may be, in order to do some custom scoring or sorting or whatever. In this case, it would also help if such collectors could be "shipped" over to where the index resides and do their job there, so that at least they don't have to move the documents acorss the wire. Dmitry -- To unsubscribe, e-mail: For additional commands, e-mail: