cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache
Date Tue, 15 Apr 2014 19:12:17 GMT


Benedict commented on CASSANDRA-5863:

I think there are at least three issues we're contending with here, and each need their own
ticket (eventually). Putting historic data on slow drives is, I think, a different problem
to putting a cache on some fast disks. Both will be helpful. Ideally I think we want the following

# Uncompressed Memory Cache
# Compressed Memory Cache (disjoint set from 1)
# Compressed SSD cache
# Regular Data
# Archived/Cold/Historic Data

The main distinction being the added "regular data" layer: any special "fast disk" cache should
not store the full sstable hierarchy and its related files, it should just store the most
popular blocks (or portions of blocks)

bq. Benedict you are describing building a custom page cache impl off heap which is pretty
ambitious. Don't you think a baby step would be to rely on the OS page cache to start and
build a custom one as a phase II?

People get very worried when they think they're competing with the kernel developers. Often
for good reason, but since we don't have to be all things to all people we get the opportunity
to make economies that aren't always as easily available to them. But also we only need to
get roughly the same performance so we can build on this to make inroads elsewhere. What we're
talking about here is pretty straight forward - it's one of the less challenging problems.
A compressed page cache is more challenging, since we don't have a uniform size, but it is
still probably not too difficult. Take a look at my suggestion for a key cache in CASSANDRA-6709
for a detailed description of how I would build the offheap structure.

The basic approach I would probably take is this: deal with 4Kb blocks. Any blocks we read
from disk larger than this we split up into 4Kb chunks and insert each into the cache separately*.
The cache itself is 8- or 16-way associative, with 3 components: a long storing the LRU information
for the bucket, 16-longs storing identity information for the lookup within the bucket, and
corresponding positions in a large address space storing each of the 4Kb data chunks. Readers
always hit the cache, and if they miss they populate the cache using the appropriate reader
before continuing. Regrettably we don't have access to SIMD instructions or we could do a
lot of this stuff tremendously efficiently, but even without that it should be pretty nippy.

*This allows us to have a greater granularity for eviction and keeps cpu-cache traffic when
reading from the cache to a minimum. It's also a pretty optimal size for reading/writing to
SSD if we overflow to disk, and is a sufficiently large amount to get good compression for
an in-memory compressed cache, whilst still being small enough to stream&decompress from
main-memory without a major penalty to lookup a small part of it.

As to having a fast disk cache, I also think this is a great idea. But I think it fits in
as an extension of this and any compressed in-memory cache, as we build a tiered-cache architecture.

> In process (uncompressed) page cache
> ------------------------------------
>                 Key: CASSANDRA-5863
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: T Jake Luciani
>            Assignee: Pavel Yaskevich
>              Labels: performance
>             Fix For: 2.1 beta2
> Currently, for every read, the CRAR reads each compressed chunk into a byte[], sends
it to ICompressor, gets back another byte[] and verifies a checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond directly from the
> It would be useful to have some kind of Chunk cache that could speed up this process
for hot data. Initially this could be a off heap cache but it would be great to put these
decompressed chunks onto a SSD so the hot data lives on a fast disk similar to

This message was sent by Atlassian JIRA

View raw message