Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 49310 invoked from network); 13 Oct 2008 15:53:01 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 13 Oct 2008 15:53:01 -0000 Received: (qmail 15456 invoked by uid 500); 13 Oct 2008 15:52:55 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 15420 invoked by uid 500); 13 Oct 2008 15:52:55 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 15409 invoked by uid 99); 13 Oct 2008 15:52:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Oct 2008 08:52:55 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [140.203.201.101] (HELO mx2.nuigalway.ie) (140.203.201.101) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Oct 2008 15:51:48 +0000 X-IronPort-AV: E=Sophos;i="4.33,404,1220223600"; d="scan'208";a="95758128" Received: from exbe1.ac.nuigalway.ie (HELO EVS1.ac.nuigalway.ie) ([10.132.157.11]) by mx2.nuigalway.ie with ESMTP; 13 Oct 2008 16:52:25 +0100 Received: from EVS1.ac.nuigalway.ie ([10.132.157.14]) by EVS1.ac.nuigalway.ie with Microsoft SMTPSVC(6.0.3790.3959); Mon, 13 Oct 2008 16:52:23 +0100 Received: from [10.2.18.102] ([140.203.154.11]) by EVS1.ac.nuigalway.ie over TLS secured channel with Microsoft SMTPSVC(6.0.3790.3959); Mon, 13 Oct 2008 16:52:23 +0100 Message-ID: <48F36EB8.2030805@deri.org> Date: Mon, 13 Oct 2008 16:52:24 +0100 From: Renaud Delbru User-Agent: Thunderbird 2.0.0.17 (X11/20080925) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: Sorting posting lists before intersection References: <48D10726.4010504@getopt.org> <48F3597B.9080302@deri.org> <48F36276.9020603@getopt.org> <200810131721.06919.paul.elschot@xs4all.nl> In-Reply-To: <200810131721.06919.paul.elschot@xs4all.nl> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 13 Oct 2008 15:52:23.0245 (UTC) FILETIME=[AEA277D0:01C92D4B] X-Virus-Checked: Checked by ClamAV on apache.org Hi, Paul Elschot wrote: > This could be done, but since not all scorers will be TermScorers it > will be necessary to add a method to Scorer (or perhaps even to its > DocIdSetIterator superclass): > > public abstract int estimatedDocFreq(); > > and implement this for all existing instances. TermScorer could > implement it without estimating. > For AND/OR/NOT such an estimation is straightforward but for > proximity queries it would be more of a guess. > I agree. Indeed, for proximity queries, it is more tricky. Maybe taking the frequency of the rarest term in a PhraseQuery / SpanQuery could be a not so bad predictor in general. Regards. -- Renaud Delbru --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org