lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Elschot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-7055) Better execution path for costly queries
Date Sat, 24 Dec 2016 13:04:58 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15774874#comment-15774874
] 

Paul Elschot commented on LUCENE-7055:
--------------------------------------

bq.  Intersecting such queries with a selective query is very inefficient since these queries
build a doc id set of matching documents for the entire index.

Just thinking out loud: how about also using a lazy doc id set builder that works on the go?
This would use one extra bit per document to indicate whether the document is already evaluated.

> Better execution path for costly queries
> ----------------------------------------
>
>                 Key: LUCENE-7055
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7055
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>         Attachments: LUCENE-7055.patch
>
>
> In Lucene 5.0, we improved the execution path for queries that run costly operations
on a per-document basis, like phrase queries or doc values queries. But we have another class
of costly queries, that return fine iterators, but these iterators are very expensive to build.
This is typically the case for queries that leverage DocIdSetBuilder, like TermsQuery, multi-term
queries or the new point queries. Intersecting such queries with a selective query is very
inefficient since these queries build a doc id set of matching documents for the entire index.
> Is there something we could do to improve the execution path for these queries?
> One idea that comes to mind is that most of these queries could also run on doc values,
so maybe we could come up with something that would help decide how to run a query based on
other parts of the query? (Just thinking out loud, other ideas are very welcome)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message