accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Fuchs (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-4626) improve cache hit rate via weak reference map
Date Wed, 19 Apr 2017 21:03:41 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-4626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15975517#comment-15975517
] 

Adam Fuchs commented on ACCUMULO-4626:
--------------------------------------

Basically, the eviction thread is separate, and the work that it has to do to evict a set
of blocks relative to the work done in the iterators is small. It is technically a race condition
(at least from a performance perspective), and the cache eviction thread wins the race. I
believe the core condition needed to trigger this is that the sum of sizes of the referenced
blocks across all of the concurrently running queries exceeds the 25% or so of the total cache
that is reserved for single-use blocks. We were able to work around it in this case by increasing
the total block cache size, but that's not necessarily always a viable solution.

> improve cache hit rate via weak reference map
> ---------------------------------------------
>
>                 Key: ACCUMULO-4626
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4626
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: tserver
>            Reporter: Adam Fuchs
>              Labels: performance, stability
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When a single iterator tree references the same RFile blocks in different branches we
sometimes get cache misses for one iterator even though the requested block is held in memory
by another iterator. This is particularly important when using something like the IntersectingIterator
to intersect many deep copies. Instead of evicting completely, keeping evicted blocks into
a WeakReference value map can avoid re-reading blocks that are currently referenced by another
deep copied source iterator.
> We've seen this in the field for some of Sqrrl's queries against very large tablets.
The total memory usage for these queries can be equal to the size of all the iterator block
reads times the number of readahead threads times the number of files times the number of
IntersectingIterator children when cache miss rates are high. This might work out to something
like:
> {code}
> 16 readahead threads * 200 deep copied children * 99% cache miss rate * 20 files * 252KB
per reader = ~16GB of memory
> {code}
> In most cases, evicting to a weak reference value map changes the cache miss rate from
very high to very low and has a dramatic effect on total memory usage.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message