lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arvind Kalyan <bas...@gmail.com>
Subject Re: Merging ordered segments without re-sorting.
Date Wed, 23 Oct 2013 19:46:13 GMT
Thanks, my understanding is that SortingMergePolicy performs sorting after
wrapping the 2 segments, correct?

As I mentioned in my original email I would like to avoid the re-sorting
and exploit the fact that the input segments are already sorted.



On Wed, Oct 23, 2013 at 11:02 AM, Shai Erera <serera@gmail.com> wrote:

> Hi
>
> You can use SortingMergePolicy and SortingAtomicReader to achieve that. You
> can read more about index sorting here:
> http://shaierera.blogspot.com/2013/04/index-sorting-with-lucene.html
>
> Shai
>
>
> On Wed, Oct 23, 2013 at 8:13 PM, Arvind Kalyan <base16@gmail.com> wrote:
>
> > Hi there, I'm looking for pointers, suggestions on how to approach this
> in
> > Lucene 4.5.
> >
> > Say I am creating an index using a sequence of addDocument() calls and
> end
> > up with segments that each contain documents in a specified ordering. It
> is
> > guaranteed that there won't be updates/deletes/reads etc happening on the
> > index -- this is an offline index building task for a read-only index.
> >
> > I create the index in the above mentioned fashion
> > using LogByteSizeMergePolicy and finally do a forceMerge(1) to get a
> single
> > segment in the ordering I want.
> >
> > Now my requirement is that I need to be able to merge this single segment
> > with another such segment (say from yesterday's index) and guarantee some
> > ordering -- say I have a comparator which looks at some field values in
> the
> > 2 given docs and defines the ordering.
> >
> > Index 1 with segment X:
> > (a,1)
> > (b,2)
> > (e,10)
> >
> > Index 2 (say from yesterday) with some segment Y:
> > (c,4)
> > (d,6)
> >
> > Essentially we have 2 ordered segments, and I'm looking to 'merge' them
> > (literally) using the value of some field, without having to re-sort them
> > which would be too time & resource consuming.
> >
> > Output Index, with some segment Z:
> > (a,1)
> > (b,2)
> > (c,4)
> > (d,6)
> > (e,10)
> >
> > Is this already possible? If not, any tips on how I can approach
> > implementing this requirement?
> >
> > Thanks,
> >
> > --
> > Arvind Kalyan
> >
>



-- 
Arvind Kalyan
http://www.linkedin.com/in/base16
cell: (408) 761-2030

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message