lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adrien Grand (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-8058) Never cache large TermInSetQuery instances
Date Tue, 28 Nov 2017 15:43:00 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-8058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Adrien Grand updated LUCENE-8058:
---------------------------------
    Attachment: LUCENE-8058.patch

[~jim.ferenczi] I slightly changed the approach:
 - I increased the memory usage that we assume for queries to 1024. I think this makes sense
since this was initially computed as the memory usage of a term query but we do not cache
term queries anymore so cached queries are more likely to be boolean queries with a couple
clauses.
 - I disabled caching on dismax and boolean queries that have more than 16 clauses in order
not to encourage users to switch to those queries to work around the fact that we no longer
cache large term-in-set queries.

What do you think?

> Never cache large TermInSetQuery instances
> ------------------------------------------
>
>                 Key: LUCENE-8058
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8058
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>             Fix For: master (8.0), 7.2
>
>         Attachments: LUCENE-8058.patch, LUCENE-8058.patch
>
>
> I have seen several cases in which the query cache was highly underestimating its memory
usage due to the fact that it had references to large queries that ended up using more memory
than the associated doc id sets.
> We had a workaround for term-in-set queries by making TermInSetQuery implement Accountable,
but this information is lost when it is wrapped in another query such as a BooleanQuery. So
I would like to apply a safer fix that just disables caching on large TermInSetQuery instances.
> I know it's a pity given that large queries are probably more expensive and thus more
cache-worthy, but I see such large queries as the result of a bad design or a workaround to
the fact that Lucene is not the right tool for the job, so I think that disabling caching
on large term-in-set queries is the right trade-off by making the query cache safer for the
majority of our users.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message