Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 1218 invoked from network); 30 Sep 2009 15:13:11 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 30 Sep 2009 15:13:11 -0000 Received: (qmail 54401 invoked by uid 500); 30 Sep 2009 15:13:09 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 54326 invoked by uid 500); 30 Sep 2009 15:13:08 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 54316 invoked by uid 99); 30 Sep 2009 15:13:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Sep 2009 15:13:08 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of serera@gmail.com designates 209.85.219.219 as permitted sender) Received: from [209.85.219.219] (HELO mail-ew0-f219.google.com) (209.85.219.219) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Sep 2009 15:12:59 +0000 Received: by ewy19 with SMTP id 19so6595071ewy.28 for ; Wed, 30 Sep 2009 08:12:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=JHe5stFkiyNxuNl+DkPbcGvHgiSui75AFxiy04QDUWg=; b=bFf5/ktZQ5MevObvGtKIv/rJEPkv/hW7IaYNwcW4QnsRau1E7i6N3HQG6GC/isXRc4 08Gg4RdP8+3UmRqIA7VglOXUn8OpO5zj4NiBFBSBhePQChHPVcH8sbOxCWCaIn4fMsVw 6+FKSeTNKE4pGz9r9bL7MPZ2vejJdM8G+N9C4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=bGZ7oGb+l2q7BkeEFFG9R+wC5FiBf4P8ZgMsFTkwhZsqpgz3DAlotSsqnro46lT3A0 9pbGbdB61OA2s8uNR4zkmsaOhc389mfEIftqlEHitWTu3s9GaDRXucVylvHCPTynseu1 6OJpT0Fkdmf9mIwdlnMbICTHf03sIvGc4kOFQ= MIME-Version: 1.0 Received: by 10.216.86.195 with SMTP id w45mr1350943wee.82.1254323558240; Wed, 30 Sep 2009 08:12:38 -0700 (PDT) In-Reply-To: <4AC371CD.8060601@gmail.com> References: <782167355.1253969596054.JavaMail.jira@brutus> <519397.28615.qm@web27107.mail.ukl.yahoo.com> <4AC371CD.8060601@gmail.com> Date: Wed, 30 Sep 2009 17:12:38 +0200 Message-ID: <786fde50909300812l7fe50099g544314ff796fae76@mail.gmail.com> Subject: Re: TSDC, TopFieldCollector & co From: Shai Erera To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=0016e6d7e06c1f53310474ccf68a X-Virus-Checked: Checked by ClamAV on apache.org --0016e6d7e06c1f53310474ccf68a Content-Type: text/plain; charset=ISO-8859-1 I agree. If you need sort-by-score, it's better to use the "fast" search methods. IndexSearcher will create the appropriate TSDC instance for you, based on the Query that was passed. If you need to create multiple Collectors and pass a kind of Multi-Collector to IndexSearcher, then you should create TSDC according to Mark's example above. Shai On Wed, Sep 30, 2009 at 4:57 PM, Mark Miller wrote: > If you want relevance sorting (Sort.Score not Sort.Relevance right?), > I'd think you want to use TopScoreDocCollector, not TopFieldCollector. > The only reason to use relevance with TopFieldCollector is if you you > are doing a nth sort with a field sort as well. > > You don't really need to worry about things like turning off the max > score tracking here - its just going to be the first doc on the queue. > > You also do want to specify whether or not to collect docs in order if > you care about performance: > > public static TopScoreDocCollector create(int numHits, boolean > docsScoredInOrder) > > ie: > > TopScoreDocCollector.create(nDocs, !weight.scoresDocsOutOfOrder()); > > Which means you just want option 1. > > -- > - Mark > > http://www.lucidimagination.com > > > > eks dev wrote: > > Hi All, > > > > What is the best way to achieve the following and what are the > differences, if I say "I do not normalize scores, so I do not need max score > tracking, I do not care if hits are returned in doc id order, or any other > order. I need only to get maxDocs *best scoring* documents": > > > > OPTION 1: > > TopDocs top = ixSearcher.search(q, filter, maxDocs); > > > > OPTION 2: > > final TopScoreDocCollector tfc = TopScoreDocCollector.create(maxDocs, > false); > > ixSearcher.search(q, filter, tfc); > > TopDocs top = tfc.topDocs(); > > > > > > OPTION 3: > > final TopFieldCollector tfc = > TopFieldCollector.create(Sort.RELEVANCE, maxDocs, > > false /* fillFields */, > > true /* trackDocScores */, > > false /* trackMaxScore */, > > false /* docsInOrder */); > > > > ixSearcher.search(q.weight(ixSearcher),filter, tfc); > > TopDocs top = tfc.topDocs(); > > > > > > what are the pros and cons? > > If I read javadoc correctly, > > - OPTION 1 tracks max score and delivers doc Ids in order (suboptimal > performance for my case) > > - OPTION 2 I do not know abut max score tracking, but doc Ids are not > required to be in order > > - OPTION 3 looks like exactly what I want, but one performance comment in > javadoc about Sort.RELEVANCE made me think if that is the fastest way? > > > > What would be recommended here, any other options to achieve the fastest > search with above defined conditions (no max score tracking and doc id order > irrelevant)? OPTIN2 looks nice, but as said, I am not sure about max score > tracking? > > > > Thanks, > > eks > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --0016e6d7e06c1f53310474ccf68a--