lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adrien Grand (JIRA)" <>
Subject [jira] [Commented] (LUCENE-4752) Merge segments to sort them
Date Mon, 04 Mar 2013 17:07:13 GMT


Adrien Grand commented on LUCENE-4752:

bq. How can you early terminate a query for a single segment? [...] To early terminate efficiently,
you must have the segments in a consistent order, e.g. S1 > S2 > S3.

I think this is just an API limitation? Segments being processed independently, we should
be able to terminate collection on a per-segment basis? 

bq. instead of stuffing into IWC what seems like a random setting (pick-segments-for-sorting),
we should have something more generic, like AtomicReaderFactory

I didn't mean this should be a boolean. Of course it should be something more flexible/configurable!
I'm very bad at picking names, but following your naming suggestion, we could have something
abstract class AtomicReaderFactory {
  abstract List<AtomicReader> reorder(List<SegmentReader> segmentReaders);

The default impl would be the identity whereas the sorting impl would return a singleton containing
a sorted view over the segment readers?

bq. Also, a custom SegmentMerger to implement the zig-zag merge would help too.

This is another option. I actually started exploring this option when David opened this issue,
but it can become complicated, especially for postings lists merging, whereas reusing the
sorted view from LUCENE-3918 would make merging trivial.
> Merge segments to sort them
> ---------------------------
>                 Key: LUCENE-4752
>                 URL:
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/index
>            Reporter: David Smiley
>            Assignee: Adrien Grand
> It would be awesome if Lucene could write the documents out in a segment based on a configurable
order.  This of course applies to merging segments to. The benefit is increased locality on
disk of documents that are likely to be accessed together.  This often applies to documents
near each other in time, but also spatially.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message