Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 59827 invoked from network); 27 Nov 2006 15:07:13 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 27 Nov 2006 15:07:13 -0000 Received: (qmail 6339 invoked by uid 500); 27 Nov 2006 15:07:19 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 6292 invoked by uid 500); 27 Nov 2006 15:07:19 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 6269 invoked by uid 99); 27 Nov 2006 15:07:19 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Nov 2006 07:07:19 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of markrmiller@gmail.com designates 64.233.184.233 as permitted sender) Received: from [64.233.184.233] (HELO wr-out-0506.google.com) (64.233.184.233) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Nov 2006 07:07:06 -0800 Received: by wr-out-0506.google.com with SMTP id i21so199957wra for ; Mon, 27 Nov 2006 07:06:45 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:user-agent:mime-version:to:subject:references:in-reply-to:content-type:content-transfer-encoding; b=dLXxAngPIe/npvTEiTL+GTncGBv3kz1cRubZl+fy2FCp0rGfeKgw0DSq4W+5gJR7h85hzz8IRi1m0rucBdR0M4XK0zmFDEFNF5ahayUnSBaL6CfvbZJ24fAm1ktS+3lzWrUBguZP2nprDxRgtNtMyhuqoYli7+d/EP5tcd/iBIo= Received: by 10.100.57.14 with SMTP id f14mr1374463ana.1164640005138; Mon, 27 Nov 2006 07:06:45 -0800 (PST) Received: from ?192.168.1.102? ( [216.66.114.42]) by mx.google.com with ESMTP id 43sm27273201wri.2006.11.27.07.06.41; Mon, 27 Nov 2006 07:06:42 -0800 (PST) Message-ID: <456AFF09.7060200@gmail.com> Date: Mon, 27 Nov 2006 10:06:49 -0500 From: Mark Miller User-Agent: Thunderbird 1.5.0.8 (Windows/20061025) MIME-Version: 1.0 To: java-dev@lucene.apache.org Subject: Re: Controlling Hits References: <20061126234739.15090.qmail@web50304.mail.yahoo.com> In-Reply-To: <20061126234739.15090.qmail@web50304.mail.yahoo.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org The only thing that Hits gives that I want without the expense (Hits is expensive to use this way) is that you can do a search and get all of the results back with sorting...sorting appears to be built into TopDocs, so you don't get it with a HitCollector. If you try and use TopDocs instead of hits then you need to know how many docs will match...you do not have that info before doing the search...TopDocs requires it though (for sorting and non sorting), to initialize its priority queues to the correct size. Hits is also nice for normalizing scores for you. - Mark Otis Gospodnetic wrote: > Heh, brave! :) I haven't used TopDocs enough to feel this strong against Hits yet. > But while we are at it, what I think I really want is what Marvin does in KinoSearch: > > my $hits = $searcher->search( query => $query ); > $hits->seek( $offset, $num_wanted ); > > This suits my experience and typical use of Lucene. I always know which "page" of results I want, and how many matches I want to show per page, so I always know the offset and always know how many matches after that offset I need. If I show 10 results per page, and want to get a third page of results, ideally I'd do as little work as possible for the first 20 matches, and just get the slice I need. Of course, I'll still need to go through the first 20 and score them, but in the end I'll just throw them out. > > Otis > > ----- Original Message ---- > From: Nadav Har'El > To: java-dev@lucene.apache.org > Sent: Sunday, November 26, 2006 3:07:26 AM > Subject: Re: Controlling Hits > > On Fri, Nov 24, 2006, Otis Gospodnetic wrote about "Controlling Hits": > >> Hi, >> >> Could we make Hits non-final, or at least expose something in Hits to control the number of Documents it reads from disk? >> ... >> Or maybe the answer is: Use the search method that returns TopDocs if you want more control...? >> > > In an application I was writing, I was facing similar issues: "Hits" was fine > for a short Demo in Lucene, but when it came to a real application, it didn't > give me enough control: it reran the search too many times when you wanted > to see, e.g, the 20th result page, and wouldn't allow me adding a HitCollector > which I needed. I started by modifying Hits (which wasn't just final - much > of its functionality was private), but then realized: there's simply no > reason to use Hits! IndexSearcher.search() which returns TopDocs already > gives you full control, and frankly isn't that much harder to use. > > In fact, I fail to see a situation where "Hits"'s concept of "random access" > to the results (you can ask for result #30 and then #70) even makes sense. > In all search applications I'm familar with, at the time you call search(), > you already know how many results you want to display - and you don't need > someone to guess for you that you need 50 results, and if that's not enough > then you need 100 results, and then 200, and so on. > And since this concept of "random access" is what differenciates Hits from > TopDocs, perhaps we don't need Hits at all? > > So, how about deprecating Hits altogether, and recommending the TopDocs > alternatives instead? > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org