Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 69118 invoked from network); 27 Oct 2007 09:12:42 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 27 Oct 2007 09:12:42 -0000 Received: (qmail 89248 invoked by uid 500); 27 Oct 2007 09:12:22 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 89217 invoked by uid 500); 27 Oct 2007 09:12:22 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 89205 invoked by uid 99); 27 Oct 2007 09:12:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 27 Oct 2007 02:12:22 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [69.44.16.11] (HELO getopt.org) (69.44.16.11) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 27 Oct 2007 09:12:23 +0000 Received: from [192.168.0.254] (75-mo3-2.acn.waw.pl [62.121.105.75]) (authenticated) by getopt.org (8.11.6/8.11.6) with ESMTP id l9R9C5613825 for ; Sat, 27 Oct 2007 04:12:05 -0500 Message-ID: <472300E0.3070203@getopt.org> Date: Sat, 27 Oct 2007 11:12:00 +0200 From: Andrzej Bialecki User-Agent: Thunderbird 2.0.0.6 (Windows/20070728) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: Sorted Index References: <13438928.post@talk.nabble.com> <13439134.post@talk.nabble.com> In-Reply-To: <13439134.post@talk.nabble.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org John Patterson wrote: > > > Yonik Seeley wrote: >> On 10/26/07, John Patterson wrote: >> Most things in an inverted index are sorted (terms, matching document >> ids, term positions within a field, etc). Can you be more specific >> about what you are trying to accomplish? >> > > Sorry, I mean sorting the documents in an order other than the order they > are added. The my search could just return docs in index order. For the > most common sorting I could collect only the first x docs and then > short-circuit the search like we previously discussed. These questions already have an answer in Nutch (see the org.apache.nutch.indexer.IndexSorter, and org.apache.nutch.searcher.LuceneQueryOptimizer$LimitedCollector). > > I was wondering if it is possible to apply a sort at merge time? One method that I'm familiar with is the following: you can split the result set into several large-ish bins, and apply arbitrary sorting methods within each bin. Studies show that if you pick the right bin size, users will rarely look into the second and the following bins, so the task is reduced to the sorting of the first bin, e.g. 100 top scoring docs. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org