lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
Date Wed, 17 Dec 2008 18:07:44 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657466#action_12657466
] 

Doug Cutting commented on LUCENE-1483:
--------------------------------------

bq. I would actually be fine with keeping HitCollector, adding a default "setNextReader" method,
that either throws UOE or (if we are strongly against exceptions) returns "false" indicating
it cannot handle sequential readers.

Could we instead add a new HitCollector subclass, that adds the setNextReader, then use 'instanceof'
to decide whether to wrap or not?

bq. I really don't fully understand BooleanScorer!

The original version of BooleanScorer uses a ~16k array to score windows of docs.  So it scores
docs 0-16k first, then docs 16-32k, etc. For each window it iterates through all query terms
and accumulates a score in table[doc%16k].  It also stores in the table a bitmask representing
which terms contributed to the score.  Non-zero scores are chained in a linked list.  At the
end of scoring each window it then iterates through the linked list and, if the bitmask matches
the boolean constraints, collects a hit.  For boolean queries with lots of frequent terms
this can be much faster, since it does not need to update a priority queue for each posting,
instead performing constant-time operations per posting.  The only downside is that it results
in hits being delivered out-of-order within the window, which means it cannot be nested within
other scorers.  But it works well as a top-level scorer.  The new BooleanScorer2 implementation
instead works by merging priority queues of postings, albeit with some clever tricks.  For
example, a pure conjunction (all terms required) does not require a priority queue.  Instead
it sorts the posting streams at the start, then repeatedly skips the first to to the last.
 If the first ever equals the last, then there's a hit.  When some terms are required and
some terms are optional, the conjunction can be evaluated first, then the optional terms can
all skip to the match and be added to the score.  Thus the conjunction can reduce the number
of priority queue updates for the optional terms.  Does that help any?


> Change IndexSearcher to use MultiSearcher semantics for multiple subreaders
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-1483
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1483
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.9
>            Reporter: Mark Miller
>            Priority: Minor
>         Attachments: LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch,
LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch,
LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch, LUCENE-1483.patch
>
>
> FieldCache and Filters are forced down to a single segment reader, allowing for individual
segment reloading on reopen.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message