lucene-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ben Manes (Jira)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-9038) Evaluate Caffeine for LruQueryCache
Date Fri, 08 Nov 2019 06:10:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-9038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969869#comment-16969869
] 

Ben Manes commented on LUCENE-9038:
-----------------------------------

Attached a very rough sketch of what this could look like. A cache hit would be lock-free
and a miss would be performed under a per-segment {{computeIfAbsent}}. A cheap computation
back would cause the segment to be re-weighed, perhaps triggering an eviction. A lot of {{LruQueryCache}}
needs to be ported over, but I think that is straightforward. It may look a lot like the current
cache in the end, but benefit from having concurrent data structures to work off of.

Let me know if you think this is the right direction.

> Evaluate Caffeine for LruQueryCache
> -----------------------------------
>
>                 Key: LUCENE-9038
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9038
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Ben Manes
>            Priority: Major
>         Attachments: CaffeineQueryCache.java
>
>
> [LRUQueryCache|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java]
appears to play a central role in Lucene's performance. There are many issues discussing its
performance, such as LUCENE-7235, LUCENE-7237, LUCENE-8027, LUCENE-8213, and LUCENE-9002.
It appears that the cache's overhead can be just as much of a benefit as a liability, causing
various workarounds and complexity.
> When reviewing the discussions and code, the following issues are concerning:
> # The cache is guarded by a single lock for all reads and writes.
> # All computations are performed outside of the any locking to avoid penalizing other
callers. This  doesn't handle the cache stampedes meaning that multiple threads may cache
miss, compute the value, and try to store it. That redundant work becomes expensive under
load and can be mitigated with ~ per-key locks.
> # The cache queries the entry to see if it's even worth caching. At first glance one
assumes that is so that inexpensive entries don't bang on the lock or thrash the LRU. However,
this is also used to indicate data dependencies for uncachable items (per JIRA), which perhaps
shouldn't be invoking the cache.
> # The cache lookup is skipped if the global lock is held and the value is computed, but
not stored. This means a busy lock reduces performance across all usages and the cache's effectiveness
degrades. This is not counted in the miss rate, giving a false impression.
> # An attempt was made to perform computations asynchronously, due to their heavy cost
on tail latencies. That work was reverted due to test failures and is being worked on.
> # An [in-progress change|https://github.com/apache/lucene-solr/pull/940] tries to avoid
LRU thrashing due to large, infrequently used items being cached.
> # The cache is tightly intertwined with business logic, making it hard to tease apart
core algorithms and data structures from the usage scenarios.
> It seems that more and more items skip being cached because of concurrency and hit rate
performance, causing special case fixes based on knowledge of the external code flows. Since
the developers are experts on search, not caching, it seems justified to evaluate if an off-the-shelf
library would be more helpful in terms of developer time, code complexity, and performance.
Solr has already introduced [Caffeine|https://github.com/ben-manes/caffeine] in SOLR-8241
and SOLR-13817.
> The proposal is to replace the internals {{LruQueryCache}} so that external usages are
not affected in terms of the API. However, like in {{SolrCache}}, a difference is that Caffeine
only bounds by either the number of entries or an accumulated size (e.g. bytes), but not both
constraints. This likely is an acceptable divergence in how the configuration is honored.
> cc [~ab], [~dsmiley]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Mime
View raw message