lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravikumar Govindarajan <ravikumar.govindara...@gmail.com>
Subject Re: SortingMergePolicy for already sorted segments
Date Tue, 17 Jun 2014 12:03:30 GMT
I am afraid the DocMap still maintains doc-id mappings till merge and I am
trying to avoid it...

I think lucene itself has a MergeIterator in o.a.l.util package.

A MergePolicy can wrap a simple MergeIterator for iterating docs across
different AtomicReaders in correct sort-order for a given field/term

That should be fine right?

--
Ravi

--
Ravi


On Tue, Jun 17, 2014 at 1:24 PM, Shai Erera <serera@gmail.com> wrote:

> loadSortTerm is your method right? In the current Sorter.sort
> implementation, I see this code:
>
>     boolean sorted = true;
>     for (int i = 1; i < maxDoc; ++i) {
>       if (comparator.compare(i-1, i) > 0) {
>         sorted = false;
>         break;
>       }
>     }
>     if (sorted) {
>       return null;
>     }
>
> Perhaps you can write similar code?
>
> Also note that the sorting interface has changed, I think in 4.8, and now
> you don't really need to implement a Sorter, but rather pass a SortField,
> if that works for you.
>
> Shai
>
>
> On Tue, Jun 17, 2014 at 9:41 AM, Ravikumar Govindarajan <
> ravikumar.govindarajan@gmail.com> wrote:
>
> > Shai,
> >
> > This is the code snippet I use inside my class...
> >
> > public class MySorter extends Sorter {
> >
> > @Override
> >
> > public DocMap sort(AtomicReader reader) throws IOException {
> >
> >   final Map<Integer, BytesRef> docVsId = loadSortTerm(reader);
> >
> >   final Sorter.DocComparator comparator = new Sorter.DocComparator() {
> >
> >   @Override
> >
> >    public int compare(int docID1, int docID2) {
> >
> >       BytesRef v1 = docVsId.get(docID1);
> >
> >       BytesRef v2 = docVsId.get(docID2);
> >
> >        return v1.compareTo(v2);
> >
> >    }
> >
> >  };
> >
> >  return sort(reader.maxDoc(), comparator);
> >
> > }
> > }
> >
> > My Problem is, the "AtomicReader" passed to Sorter.sort method is
> actually
> > a SlowCompositeReader, composed of a list of AtomicReaders each of which
> is
> > already sorted.
> >
> > I find this "loadSortTerm(compositeReader)" to be a bit heavy where it
> > tries to all load the doc-to-term mappings eagerly...
> >
> > Are there some alternatives for this?
> >
> > --
> > Ravi
> >
> >
> > On Tue, Jun 17, 2014 at 10:58 AM, Shai Erera <serera@gmail.com> wrote:
> >
> > > I'm not sure that I follow ... where do you see DocMap being loaded up
> > > front? Specifically, Sorter.sort may return null of the readers are
> > already
> > > sorted ... I think we already optimized for the case where the readers
> > are
> > > sorted.
> > >
> > > Shai
> > >
> > >
> > > On Tue, Jun 17, 2014 at 4:04 AM, Ravikumar Govindarajan <
> > > ravikumar.govindarajan@gmail.com> wrote:
> > >
> > > > I am planning to use SortingMergePolicy where all the
> > merge-participating
> > > > segments are already sorted... I understand that I need to define a
> > > DocMap
> > > > with old-new doc-id mappings.
> > > >
> > > > Is it possible to optimize the eager loading of DocMap and make it
> kind
> > > of
> > > > lazy load on-demand?
> > > >
> > > > Ex: Pass List<AtomicReader> to the caller and ask for next new-old
> doc
> > > > mapping..
> > > >
> > > > Since my segments are already sorted, I could save on memory a
> > little-bit
> > > > this way, instead of loading the full DocMap upfront
> > > >
> > > > --
> > > > Ravi
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message