Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 027A8112F8 for ; Tue, 17 Jun 2014 12:03:59 +0000 (UTC) Received: (qmail 90610 invoked by uid 500); 17 Jun 2014 12:03:57 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 90552 invoked by uid 500); 17 Jun 2014 12:03:57 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 90538 invoked by uid 99); 17 Jun 2014 12:03:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Jun 2014 12:03:56 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ravikumar.govindarajan@gmail.com designates 74.125.82.174 as permitted sender) Received: from [74.125.82.174] (HELO mail-we0-f174.google.com) (74.125.82.174) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Jun 2014 12:03:51 +0000 Received: by mail-we0-f174.google.com with SMTP id u57so7231650wes.33 for ; Tue, 17 Jun 2014 05:03:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=YYCwyDSE46iWeYrLmj0zIxTt1nv5lOylEjJx+vVv3bA=; b=GQhdgOsuwlX9MSadviwJ8pOWKMosOV2f2Xb0jcGkxVfkSZOyePCzQy1bxRDfHgTReF pwW5axZsW7c0Hqa2GEHCIVmJ/z8NCGC9R2U1X8YTT5GTmILFNJr3r0Pt+mT7FPxAt1+f UMs/mPrFug6sQc0Stq6YcVMgfM5+PG30HsxMkwqiWIAbPA9C9LVy4EIFfLkyBH12PM5f P02Ajv48U0ZjsXwtZMsGu14y/ouBPnOK7oFvr2YkeKXzgfak9wUo2+hi0LAtoOj0hj5v X3mOxEQGeC49MIJOuT8zUs7r5j5r6gau+UWnW32pxceKk3/H/85yKvia4sOWVI2io4yC tsJA== MIME-Version: 1.0 X-Received: by 10.180.101.98 with SMTP id ff2mr36418590wib.40.1403006610179; Tue, 17 Jun 2014 05:03:30 -0700 (PDT) Received: by 10.180.6.106 with HTTP; Tue, 17 Jun 2014 05:03:30 -0700 (PDT) In-Reply-To: References: Date: Tue, 17 Jun 2014 17:33:30 +0530 Message-ID: Subject: Re: SortingMergePolicy for already sorted segments From: Ravikumar Govindarajan To: "java-user@lucene.apache.org" Content-Type: multipart/alternative; boundary=f46d041826229fbabe04fc06eedd X-Virus-Checked: Checked by ClamAV on apache.org --f46d041826229fbabe04fc06eedd Content-Type: text/plain; charset=UTF-8 I am afraid the DocMap still maintains doc-id mappings till merge and I am trying to avoid it... I think lucene itself has a MergeIterator in o.a.l.util package. A MergePolicy can wrap a simple MergeIterator for iterating docs across different AtomicReaders in correct sort-order for a given field/term That should be fine right? -- Ravi -- Ravi On Tue, Jun 17, 2014 at 1:24 PM, Shai Erera wrote: > loadSortTerm is your method right? In the current Sorter.sort > implementation, I see this code: > > boolean sorted = true; > for (int i = 1; i < maxDoc; ++i) { > if (comparator.compare(i-1, i) > 0) { > sorted = false; > break; > } > } > if (sorted) { > return null; > } > > Perhaps you can write similar code? > > Also note that the sorting interface has changed, I think in 4.8, and now > you don't really need to implement a Sorter, but rather pass a SortField, > if that works for you. > > Shai > > > On Tue, Jun 17, 2014 at 9:41 AM, Ravikumar Govindarajan < > ravikumar.govindarajan@gmail.com> wrote: > > > Shai, > > > > This is the code snippet I use inside my class... > > > > public class MySorter extends Sorter { > > > > @Override > > > > public DocMap sort(AtomicReader reader) throws IOException { > > > > final Map docVsId = loadSortTerm(reader); > > > > final Sorter.DocComparator comparator = new Sorter.DocComparator() { > > > > @Override > > > > public int compare(int docID1, int docID2) { > > > > BytesRef v1 = docVsId.get(docID1); > > > > BytesRef v2 = docVsId.get(docID2); > > > > return v1.compareTo(v2); > > > > } > > > > }; > > > > return sort(reader.maxDoc(), comparator); > > > > } > > } > > > > My Problem is, the "AtomicReader" passed to Sorter.sort method is > actually > > a SlowCompositeReader, composed of a list of AtomicReaders each of which > is > > already sorted. > > > > I find this "loadSortTerm(compositeReader)" to be a bit heavy where it > > tries to all load the doc-to-term mappings eagerly... > > > > Are there some alternatives for this? > > > > -- > > Ravi > > > > > > On Tue, Jun 17, 2014 at 10:58 AM, Shai Erera wrote: > > > > > I'm not sure that I follow ... where do you see DocMap being loaded up > > > front? Specifically, Sorter.sort may return null of the readers are > > already > > > sorted ... I think we already optimized for the case where the readers > > are > > > sorted. > > > > > > Shai > > > > > > > > > On Tue, Jun 17, 2014 at 4:04 AM, Ravikumar Govindarajan < > > > ravikumar.govindarajan@gmail.com> wrote: > > > > > > > I am planning to use SortingMergePolicy where all the > > merge-participating > > > > segments are already sorted... I understand that I need to define a > > > DocMap > > > > with old-new doc-id mappings. > > > > > > > > Is it possible to optimize the eager loading of DocMap and make it > kind > > > of > > > > lazy load on-demand? > > > > > > > > Ex: Pass List to the caller and ask for next new-old > doc > > > > mapping.. > > > > > > > > Since my segments are already sorted, I could save on memory a > > little-bit > > > > this way, instead of loading the full DocMap upfront > > > > > > > > -- > > > > Ravi > > > > > > > > > > --f46d041826229fbabe04fc06eedd--