cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <>
Subject [jira] [Resolved] (CASSANDRA-6709) Changes to KeyCache
Date Thu, 14 Aug 2014 10:00:23 GMT


Benedict resolved CASSANDRA-6709.

    Resolution: Later

Closing as the new sstable format most likely makes this unnecessary by eliminating the need
for a separate key cache, although we *may* want to revisit this at some point afterwards
since a separate cache could still be beneficial by improving memory occupancy rate, so closing
as later instead of duplicate.

> Changes to KeyCache
> -------------------
>                 Key: CASSANDRA-6709
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>            Priority: Minor
> It seems to me that KeyCache can be improved in a number of ways, but first let's state
the basic goal of KeyCache: to reduce the average query response time by providing an exact
seek position in a file for a given key.
> As it stands, KeyCache is both 100% accurate, but requires a lot of overhead per entry.
> I propose to make KeyCache *mostly* accurate (say 99.9999%), by which I means it will
always fail accurately, but may rarely return an incorrect address, and code the end users
of it to be able to retry to confirm they seeked to the correct position in the file, and
to retry without the cache if they did not.
> The advantage of this is that we can both take the cache off-heap easily, and pack a
lot more items into the cache. If we permit collisions across files and simply use the (full
128-bit) murmur hash of the key + cfId + file generation, we should get enough uniqueness
to rarely have an erroneuous collision, however we will be using only 20 bytes per key, instead
of the current 100 + <key length> bytes. This should allow us to answer far more queries
from the key cache than before, so the positive improvement to performance should be greater
than the negative drain.
> For the structure I propose an associative cache, where a single contiguous address space
is broken up into regions of, say, 8 entries, plus one counter. The counter tracks the recency
of access of each of the entries, so that on write the least recently accessed/written can
be replaced. A linear probe within the region is used to determine if the entry we're looking
for is present. This should be very quick, as the entire region should fit into one or two
lines of L1.
> Advantage: we may see 5x bump in cache hit-rate, or even more for large keys.

This message was sent by Atlassian JIRA

View raw message