lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: SortingMergePolicy for already sorted segments
Date Tue, 17 Jun 2014 07:54:01 GMT
loadSortTerm is your method right? In the current Sorter.sort
implementation, I see this code:

    boolean sorted = true;
    for (int i = 1; i < maxDoc; ++i) {
      if (comparator.compare(i-1, i) > 0) {
        sorted = false;
        break;
      }
    }
    if (sorted) {
      return null;
    }

Perhaps you can write similar code?

Also note that the sorting interface has changed, I think in 4.8, and now
you don't really need to implement a Sorter, but rather pass a SortField,
if that works for you.

Shai


On Tue, Jun 17, 2014 at 9:41 AM, Ravikumar Govindarajan <
ravikumar.govindarajan@gmail.com> wrote:

> Shai,
>
> This is the code snippet I use inside my class...
>
> public class MySorter extends Sorter {
>
> @Override
>
> public DocMap sort(AtomicReader reader) throws IOException {
>
>   final Map<Integer, BytesRef> docVsId = loadSortTerm(reader);
>
>   final Sorter.DocComparator comparator = new Sorter.DocComparator() {
>
>   @Override
>
>    public int compare(int docID1, int docID2) {
>
>       BytesRef v1 = docVsId.get(docID1);
>
>       BytesRef v2 = docVsId.get(docID2);
>
>        return v1.compareTo(v2);
>
>    }
>
>  };
>
>  return sort(reader.maxDoc(), comparator);
>
> }
> }
>
> My Problem is, the "AtomicReader" passed to Sorter.sort method is actually
> a SlowCompositeReader, composed of a list of AtomicReaders each of which is
> already sorted.
>
> I find this "loadSortTerm(compositeReader)" to be a bit heavy where it
> tries to all load the doc-to-term mappings eagerly...
>
> Are there some alternatives for this?
>
> --
> Ravi
>
>
> On Tue, Jun 17, 2014 at 10:58 AM, Shai Erera <serera@gmail.com> wrote:
>
> > I'm not sure that I follow ... where do you see DocMap being loaded up
> > front? Specifically, Sorter.sort may return null of the readers are
> already
> > sorted ... I think we already optimized for the case where the readers
> are
> > sorted.
> >
> > Shai
> >
> >
> > On Tue, Jun 17, 2014 at 4:04 AM, Ravikumar Govindarajan <
> > ravikumar.govindarajan@gmail.com> wrote:
> >
> > > I am planning to use SortingMergePolicy where all the
> merge-participating
> > > segments are already sorted... I understand that I need to define a
> > DocMap
> > > with old-new doc-id mappings.
> > >
> > > Is it possible to optimize the eager loading of DocMap and make it kind
> > of
> > > lazy load on-demand?
> > >
> > > Ex: Pass List<AtomicReader> to the caller and ask for next new-old doc
> > > mapping..
> > >
> > > Since my segments are already sorted, I could save on memory a
> little-bit
> > > this way, instead of loading the full DocMap upfront
> > >
> > > --
> > > Ravi
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message