lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Ernst (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-6585) Make ConjunctionDISI flatten sub ConjunctionDISI instances
Date Thu, 25 Jun 2015 03:15:05 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-6585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ryan Ernst updated LUCENE-6585:
-------------------------------
    Fix Version/s: Trunk
                   5.3

> Make ConjunctionDISI flatten sub ConjunctionDISI instances
> ----------------------------------------------------------
>
>                 Key: LUCENE-6585
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6585
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>             Fix For: 5.3, Trunk
>
>         Attachments: LUCENE-6585.patch, LUCENE-6585.patch
>
>
> Today ConjunctionDISI wraps some sub (two-phase) iterators. I would like to improve it
by flattening sub iterators when they implement ConjunctionDISI. In practice, this would make
"+A +(+B +C)" be executed more like "+A +B +C" (only in terms of matching, scoring would not
change).
> My motivation for this is that if we don't flatten and are unlucky, we can sometimes
hit some worst cases. For instance consider the 3 following postings lists (sorted by increasing
cost):
> A: 1, 1001, 2001, 3001, ...
> C: 0, 2, 4, 6, 8, 10, 12, 14, ...
> B: 1, 3, 5, 7, 9, 11, 13, 15, ...
> If we run "+A +B +C", then everything works fine, we use A as a lead, and advance B 1000
by 1000 to find the next match (if any).
> However if we run "+A +(+B +C)", then we would iterate B and C 2 by 2 over the entire
doc ID space when trying to find the first match which occurs on or after A:1.
> This is an extreme example which is unlikely to happen in practice, but flattening would
also help a bit on some more common cases. For instance imagine that A, B and C have respective
costs of 100, 10 and 1000. If you search for "+A +(+B +C)", then we will use the most costly
iterator (C) to confirm matches of B (the least costly iterator, used as a lead) while it
would have been more efficient to confirm matches of B with A first, since A is less costly
than C.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message