incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Coli <rc...@eventbrite.com>
Subject Re: Throughput and RAM
Date Tue, 10 Sep 2013 17:37:16 GMT
On Tue, Sep 10, 2013 at 2:30 AM, Jan Algermissen <jan.algermissen@nordsc.com
> wrote:

> So in a sense, C* is designed to maximize IO write efficiency by
> pre-organizing write queries in memory. The more memory, the better the
> organization works (caveat GC).
>

http://en.wikipedia.org/wiki/Log-structured_merge-tree
"
The LSM-tree is a hybrid data structure. It is composed of two
tree-like<http://en.wikipedia.org/wiki/Tree_(data_structure)>
structures,
known as the C0 and C1 components. C0 is smaller and entirely resident in
memory, whereas C1 is resident on disk. New records are inserted into the
memory-resident C0 component. If the insertion causes the C0 component to
exceed a certain size threshold, a contiguous segment of entries is removed
from C0 and merged into C1 on disk. The performance characteristics of
LSM-trees stem for the fact that each component is tuned to the
characteristics of its underlying storage medium, and that data is
efficiently migrated across media in rolling batches, using an algorithm
reminiscent of merge sort <http://en.wikipedia.org/wiki/Merge_sort>.
"

Cassandra takes this eagerness for consuming writes and organizing the
> writes in memory to such an extreme, that any given node will rather die
> than stop consuming writes.
>

Perhaps more simply : "RAM is faster than disk" and "Cassandra does not
prevent a given node from writing to RAM faster than it can flush to disk"?

=Rob

Mime
View raw message