Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of
 ravikumar.govindarajan@gmail.com designates 74.125.82.174 as permitted
 sender)
MIME-Version: 1.0
In-Reply-To: 
 <CALfq-2TiP2X+QofoBpAv+x=xxb09MDW+zx5kz6_=sOQ2d_-YEQ@mail.gmail.com>
References: 
 <CAGW2whQC5OvMHyCq8sPSRf144XQATJYLsHvzQY4w056fJXZt1w@mail.gmail.com>
	<CALfq-2T0QkZKx+U2pcQcgmXActfwzqVZaWgAVkBz-yKzM_8bkQ@mail.gmail.com>
	<CAGW2whQLf_2PuGu502DfLKapxN7GbQ=gm5YSV-pkDYrL7QiyUg@mail.gmail.com>
	<CALfq-2TiP2X+QofoBpAv+x=xxb09MDW+zx5kz6_=sOQ2d_-YEQ@mail.gmail.com>
Date: Tue, 17 Jun 2014 17:33:30 +0530
Message-ID: 
 <CAGW2whRpvO=Lt2P4VDd4A6UoOX_EQH1bs51Zge11Tugmb9ELjw@mail.gmail.com>
Subject: Re: SortingMergePolicy for already sorted segments
From: Ravikumar Govindarajan <ravikumar.govindarajan@gmail.com>
To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
Content-Type: multipart/alternative; boundary=f46d041826229fbabe04fc06eedd

--f46d041826229fbabe04fc06eedd
Content-Type: text/plain; charset=UTF-8

I am afraid the DocMap still maintains doc-id mappings till merge and I am
trying to avoid it...

I think lucene itself has a MergeIterator in o.a.l.util package.

A MergePolicy can wrap a simple MergeIterator for iterating docs across
different AtomicReaders in correct sort-order for a given field/term

That should be fine right?

--
Ravi

--
Ravi


On Tue, Jun 17, 2014 at 1:24 PM, Shai Erera <serera@gmail.com> wrote:

> loadSortTerm is your method right? In the current Sorter.sort
> implementation, I see this code:
>
>     boolean sorted = true;
>     for (int i = 1; i < maxDoc; ++i) {
>       if (comparator.compare(i-1, i) > 0) {
>         sorted = false;
>         break;
>       }
>     }
>     if (sorted) {
>       return null;
>     }
>
> Perhaps you can write similar code?
>
> Also note that the sorting interface has changed, I think in 4.8, and now
> you don't really need to implement a Sorter, but rather pass a SortField,
> if that works for you.
>
> Shai
>
>
> On Tue, Jun 17, 2014 at 9:41 AM, Ravikumar Govindarajan <
> ravikumar.govindarajan@gmail.com> wrote:
>
> > Shai,
> >
> > This is the code snippet I use inside my class...
> >
> > public class MySorter extends Sorter {
> >
> > @Override
> >
> > public DocMap sort(AtomicReader reader) throws IOException {
> >
> >   final Map<Integer, BytesRef> docVsId = loadSortTerm(reader);
> >
> >   final Sorter.DocComparator comparator = new Sorter.DocComparator() {
> >
> >   @Override
> >
> >    public int compare(int docID1, int docID2) {
> >
> >       BytesRef v1 = docVsId.get(docID1);
> >
> >       BytesRef v2 = docVsId.get(docID2);
> >
> >        return v1.compareTo(v2);
> >
> >    }
> >
> >  };
> >
> >  return sort(reader.maxDoc(), comparator);
> >
> > }
> > }
> >
> > My Problem is, the "AtomicReader" passed to Sorter.sort method is
> actually
> > a SlowCompositeReader, composed of a list of AtomicReaders each of which
> is
> > already sorted.
> >
> > I find this "loadSortTerm(compositeReader)" to be a bit heavy where it
> > tries to all load the doc-to-term mappings eagerly...
> >
> > Are there some alternatives for this?
> >
> > --
> > Ravi
> >
> >
> > On Tue, Jun 17, 2014 at 10:58 AM, Shai Erera <serera@gmail.com> wrote:
> >
> > > I'm not sure that I follow ... where do you see DocMap being loaded up
> > > front? Specifically, Sorter.sort may return null of the readers are
> > already
> > > sorted ... I think we already optimized for the case where the readers
> > are
> > > sorted.
> > >
> > > Shai
> > >
> > >
> > > On Tue, Jun 17, 2014 at 4:04 AM, Ravikumar Govindarajan <
> > > ravikumar.govindarajan@gmail.com> wrote:
> > >
> > > > I am planning to use SortingMergePolicy where all the
> > merge-participating
> > > > segments are already sorted... I understand that I need to define a
> > > DocMap
> > > > with old-new doc-id mappings.
> > > >
> > > > Is it possible to optimize the eager loading of DocMap and make it
> kind
> > > of
> > > > lazy load on-demand?
> > > >
> > > > Ex: Pass List<AtomicReader> to the caller and ask for next new-old
> doc
> > > > mapping..
> > > >
> > > > Since my segments are already sorted, I could save on memory a
> > little-bit
> > > > this way, instead of loading the full DocMap upfront
> > > >
> > > > --
> > > > Ravi
> > > >
> > >
> >
>

--f46d041826229fbabe04fc06eedd--