lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "paul.elschot (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-443) ConjunctionScorer tune-up
Date Tue, 11 Oct 2005 07:45:06 GMT
    [ http://issues.apache.org/jira/browse/LUCENE-443?page=comments#action_12331778 ] 

paul.elschot commented on LUCENE-443:
-------------------------------------

Once the ConjunctionScorer matches the order of the subqueries/subscorers does not matter
because they all match and a sum score needs to be formed.

Scorer.next() can only tolerate documents not to be in order at top level, where hit collection
is done.
At lower levels in nested scorers, Lucene works a document at a time, and there Scorer.skipTo(docNr)
requires that all document numbers are in order. Such skipping is needed for conjunctions.
Since the score value of a document for a query depends on the score value of the subqueries,
at some point the association on a single document must be done.
For conjunctions, skipTo() is used, but for disjunctions this association is done by
a priority queue in the trunk, and a distribution like method in 1.4.3. This distribution
method works
somewhat loosely on document order, and is therefore incompatible with skipping.

Regards,
Paul Elschot


> ConjunctionScorer tune-up
> -------------------------
>
>          Key: LUCENE-443
>          URL: http://issues.apache.org/jira/browse/LUCENE-443
>      Project: Lucene - Java
>         Type: Bug
>   Components: Search
>     Versions: 1.9
>  Environment: Linux, Java 1.5, Large Index with 4 million items and some heavily nested
boolean queries
>     Reporter: Abdul Chaudhry
>  Attachments: ConjunctionScorer.java, ConjunctionScorer.java
>
> I just recently ran a load test on the latest code from lucene , which is using a new
BooleanScore and noticed the ConjunctionScorer was crunching through objects , especially
while sorting as part of the skipTo call. It turns a linked list into an array, sorts the
array, then converts the array back to a linked list for further processing by the scoring
engines below.
> 'm not sure if anyone else is experiencing this as I have a very large index (> 4
million items) and I am issuing some heavily nested queries
> Anyway, I decide to change the link list into an array and use a first and last marker
to "simulate" a linked list.
> This scaled much better during my load test as the java gargbage collector was less -
umm - virulent 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message