lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4752) Merge segments to sort them
Date Mon, 04 Mar 2013 15:33:13 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13592289#comment-13592289
] 

Shai Erera commented on LUCENE-4752:
------------------------------------

How can you early terminate a query for a single segment? Say that you have 3 sorted segments
(individually) and your query asks to get the top-10 of some criteria. The top-10 may come
from the 3 segments as follows: seg1=4, seg2=4, seg3=2. But you don't know that until you
processed all 3 segments right? While you could make a decision on a per-segment basis to
'terminate', there's no mechanism today to tell IndexSearcher "I'm done w/ that segment, move
on". Today, if you want to early terminate, you need to throw an exception from the Collector,
and catch it outside, in your application code?

To early terminate efficiently, you must have the segments in a consistent order, e.g. S1
> S2 > S3. Then, after you've processed enough elements from S1, you can early terminate
the entire query because you're guaranteed that successive documents will be "smaller".

Unless, you add to your Collector.collect() an "if (done) return" and consider that a no-op,
or make your own IndexSearcher logic ... then per-segment early termination is doable.

As for the approach you describe, I think that instead of stuffing into IWC what seems like
a random setting (pick-segments-for-sorting), we should have something more generic, like
AtomicReaderFactory, which IW will use instead of always loading SegmentReader. That will
let you load your custom AtomicReader? Or, perhaps this can be a SortingCodec? Also, a custom
SegmentMerger to implement the zig-zag merge would help too.
                
> Merge segments to sort them
> ---------------------------
>
>                 Key: LUCENE-4752
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4752
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/index
>            Reporter: David Smiley
>            Assignee: Adrien Grand
>
> It would be awesome if Lucene could write the documents out in a segment based on a configurable
order.  This of course applies to merging segments to. The benefit is increased locality on
disk of documents that are likely to be accessed together.  This often applies to documents
near each other in time, but also spatially.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message