lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
Date Tue, 16 Dec 2008 01:19:44 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656832#action_12656832
] 

Michael McCandless commented on LUCENE-1483:
--------------------------------------------


We should definitely try fallback compare-by-value.

But, sort by title (presumably unique key) is actually a worst case
for us, because all values in the queue will not exist in the next
segment.  So it's a good test ;) We should also test sorting by an
enum field ("country", "state").

Thinking more about how to compute subords... I think we could store
ord & subord each as int, and then efficiently translate them to the
next segment with a single pass through the queue, in sort key order.
This would ensure we hit all the dups (different Strings that map to
the same ord in the next segment, but different subords) in one
cluster.  And, the subord could be easily computed by simply
incrementing (starting with 1) in key sort order, until the cluster is
done.

It should be simple to step through the pqueue's heap in sort order
min->max (w/o removing the entries which is the "normal" heapsort way
to sort the elements); you'd need to maintain some sort of queue to
keep track of the "frontier" as you walk down the heap.  But I haven't
found a cookbook example yet...  It should be fast since we can use
the ord/subords in the queue for all within-queue comparisons.

We could also save time on the binary search by bounding the search by
where we just found the last key.  It may be worth tracking the max
value in the queue, to bound the other end of the search.  For a big
search the queue should have a fairly tight bound.


> Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-1483
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1483
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.9
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch,
LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch,
LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch
>
>
> FieldCache and Filters are forced down to a single segment reader, allowing for individual
segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message