cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kenneth Brotman (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-14268) The Architecture:Guarantees web page is empty
Date Thu, 01 Mar 2018 14:25:00 GMT


Kenneth Brotman commented on CASSANDRA-14268:

>From DataStax at []
h1. Understanding the Guarantees, Limitations, and Tradeoffs of Cassandra and Materialized

By [Jake Luciani|] -  February 12, 2016

The new [Materialized Views feature|]
in Cassandra 3.0 offers an easy way to accurately denormalize data so it can be efficiently
queried.   It's meant to be used on high cardinality columns where the use of [secondary
indexes is not efficient|]
due to fan-out across all nodes.  An example would be creating a secondary index on a user_id.
 As the number of users in the system grows the longer it would take a secondary index to
locate the data since secondary indexes store data locally.  With a materialized view you
can *partition* the data on user_id so finding a specific user becomes a direct lookup with
the added benefit of holding other denormalized data from the base table along with it, similar
to a [DynamoDB global secondary index|].

Materialized views are a very useful feature to have in Cassandra but before you go jumping
in head first, it helps to understand how this feature was designed and what the guarantees

Primarily, since materialized views live in Cassandra they can offer at most what Cassandra
offers, namely a highly available, eventually consistent version of materialized views.

A quick refresher of the Cassandra guarantees and tradeoffs:

C* Guarantees:
 * Writes to a single table are guaranteed to be eventually consistent across replicas - meaning
divergent versions of a row will be reconciled and reach the same end state.
 * [Lightweight transactions|]
are guaranteed to be linearizable for table writes within a data center or globally depending
on the use of [LOCAL_SERIAL vs SERIAL|]
consistency level respectively.
 * +[Batched|]+ writes across
multiple tables are guaranteed to succeed completely or not at all (by using a durable log).
 * [Secondary indexes |](once
built) are guaranteed to be consistent with their local replicas data.

C* Limitations:
 * Cassandra provides read uncommitted isolation by default.  (Lightweight transactions provide
linearizable isolation)

C* Tradeoffs:
 * Using lower consistency levels yield higher availability and better latency at the price
of weaker consistency.
 * Using higher consistency levels yield lower availability and higher request latency with
the benefit of stronger consistency.

Another tradeoff to consider is how Cassandra deals with data safety in the face of hardware
failures.  Say your disk dies or your datacenter has a fire and you lose machines; how safe
is your data?  Well, it depends on a few factors, mainly replication factor and consistency
level used for the write.  With consistency level QUORUM and RF=3 your data is safe on at
least two nodes so if you lose one node you still have a copy.  However, if you only have
RF=1 and lose a node forever you've lost data forever.

An extreme example of this is if you have RF=3 but write at CL.ONE and the write only succeeds
on a single node, followed directly by the death of that node.  Unless the coordinator was
a different node you probably just lost data.


Given Cassandra's system properties, the implication of maintaining Materialized Views +manually+
in your application is likely to create permanent inconsistencies between views.  Since your
application will need to read the existing state from Cassandra then modify the views to clean-up
any updates existing rows.  Besides the added latency, if there are other updates going to
the same rows your reads will end up in a race condition and fail to clean up all the state
changes.  This is the scenario the [mvbench tool|] compares

The Materialized Views feature in Cassandra 3.0 was written to address these and other complexities
surrounding manual denormalization, but that is not to say it's not without its own set of
guarantees and tradeoffs to consider. To understand the internal design of Materialized Views
please read the [design document|].
 At a high level though we chose correctness over raw performance for writes, but did our
best to avoid needless write amplification.  A simple way to think about this write amplification
problem is: if I have a *base* table with RF=3 and a *view* table with RF=3 a naive approach
would send a write to each base replica and each base replica would send a view update to
each view replica; RF+RF^2 writes per-mutation!  C* Materialized Views instead pairs each
base replica with a *single* view replica. This simplifies to be RF+RF writes per mutation while
still guaranteeing convergence.

Materialized View Guarantees:
 * All changes to the base table will be eventually reflected in the view tables unless there
is a total data loss in the base table (as described in the previous section)

Materialized View Limitations:
 * All updates to the view happen asynchronously unless corresponding view replica is the
same node.  We must do this to ensure availability is not compromised.  It's easy to imagine
a worst case scenario of 10 Materialized Views for which each update to the base table requires
writing to 10 separate nodes. Under normal operation views will see the data quickly and there
are new metrics to track it (ViewWriteMetricss).
 * There is no read repair between the views and the base table.  Meaning a read repair on
the view will only correct that view's data not the base table's data.  If you are reading
from the base table though, read repair _will_ send updates to the base and the view.
 * Mutations on a base table +partition+ must happen sequentially per replica if the mutation
touches a column in a view (this will improve after ticket CASSANDRA-10307)

Materialized View Tradeoffs:
 * With materialized views you are trading performance for correctness. It takes more work
to ensure the views will see all the state changes to a given row.  Local locks and local
reads required.  If you don't need consistency or never update/delete data you can bypass
materialized views and simply write to many tables from your client.  There is also a ticket
CASSANDRA-9779 that will offer a way to bypass the performance hit in the case of insert
only workloads.
 * The data loss scenario described in the section above (there exists only a single copy
on a single node that dies) has different effects depending on if the base or view was affected.
 If view data was lost from all replicas you would need to drop and re-create the view.
 If the base table lost data through, there would be an inconsistency between the base and
the view with the view having data the base doesn't.  Currently, there is no way to fix the
base from the view; ticket CASSANDRA-10346 was added to address this.

One final point on repair. As described in the [design document|],
repairs mean different things depending on if you are repairing the base or the view.  If
you repair *only* the view you will see a consistent state across the view replicas (not the
base).  If you repair the base you will repair *both* the base and the view.  This is accomplished
by passing streamed base data through the regular write path, which in turn updates the views.
 This mode is also how bootstrapping new nodes and SSTable loading works as well to provide
consistent materialized views.


[Facebook|] [Twitter|]

> The Architecture:Guarantees web page is empty
> ---------------------------------------------
>                 Key: CASSANDRA-14268
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Documentation and Website
>            Reporter: Kenneth Brotman
>            Priority: Major
> []
> Please submit content and myself or someone else will take it from there.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message