lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sriram Sankar <san...@gmail.com>
Subject Re: segments and sorting
Date Tue, 18 Jun 2013 22:16:23 GMT
> You can sort each segment independently or have a single segment, both
> options are available. To have a single segment, you just need to wrap
> your top-level index reader with SlowCompositeReaderWrapper before
> wrapping it again in a SortingAtomicReader and calling
> IndexWriter.addIndexes.

Is it possible to do this more efficiently using a merge sort?  Assuming
the individual segments are already sorted, is there a wrapper that I can
use where I can pass the same sorting function?  I'm guessing the
SlowCompositeReaderWrapper does not assume that the individual segments are
already sorted and therefore would repeat the work?

Thanks,

Sriram.



On Sat, Jun 15, 2013 at 1:52 AM, Adrien Grand <jpountz@gmail.com> wrote:

> Hi,
>
> On Fri, Jun 14, 2013 at 11:24 PM, Sriram Sankar <sankar@gmail.com> wrote:
> > For my use case of having all docs sorted by a static rank and being able
> > to cut off retrieval after a certain number of docs, I have to sort all
> my
> > docs using the static rank (and Lucene 4 has a way to do this).
> >
> > When an index has multiple segments, how does this sorting work?  Is each
> > segment sorted independently?  Or is it possible for me to control this -
> > and have a single segment?
>
> You can sort each segment independently or have a single segment, both
> options are available. To have a single segment, you just need to wrap
> your top-level index reader with SlowCompositeReaderWrapper before
> wrapping it again in a SortingAtomicReader and calling
> IndexWriter.addIndexes.
>
> > Assuming I have a single segment, are there any other constraints?  I
> read
> > somewhere that FieldValue's have a limit of 2Gb per segment - is this
> true?
>
> What do you mean with "FieldValue"? If you are referring to stored
> fields, a single field value cannot be larger than 2B because the API
> uses ints. But some codecs enforce lower limits, for example the
> current default stored fields format enforces that the sum of the
> sizes of all fields of a _single_ document is less than 2GB (which is
> already much more than what typical users need). I think the major
> limitation is that a single Lucene index cannot have more than 2
> billion documents, but you can store your data into several physical
> shards to work around this limitation and merge results at searching
> time.
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message