cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
Date Mon, 20 Jul 2015 14:47:09 GMT


Benedict commented on CASSANDRA-6477:

bq. If you loose any 2 nodes with vnodes you can't achieve quorum anyway. I

You're right, of course. Unfortunately, this only makes the situation worse. I guess since,
as you say, we are physically incapable of reaching QUORUM for a portion, it perhaps doesn't
matter if we significantly increase that portion for MVs, since there is always a portion
for which that property holds. However it may be significantly worse, and in fact we will
not deliver some MV updates to _any_ node with only two nodes failing.

Let's say we have a cluster:

* With N nodes
* With R replication factor
* With 2 failing nodes
* With infinite vnodes (to simplify calculations)
* With F(1) ratio of token ranges overlapping between failed nodes, i.e. that cannot reach
* With F(2) ratio of token ranges involving exactly one failed node in the base table

Now, when serving a write there are multiple scenarios:

* Of the F(1) writes, only one base node receives an update; 2/N of any update generated by
this node would be routed to one of the now gone nodes. So NO nodes receive ~ 2F(1)/N MV updates.
* Of the F(2) writes, 2/N MV updates will target a dead node
* Of the remaining 1 - F(1) - F(2) writes, F(1) will be incapable of reaching QUORUM for the
same reason the base table could not

Now, to derive F(1) and F(2) approximately, let's fix some basic cluster numbers: N=6, R=3.
In this scenario F(1) is somewhere between 1/5 and 1/4. F(2) is approximately 4/9.

So, using the lower bound of F(1), we have:
* 1/15 of writes reach no MV node whatsoever
* 4/27 of do not reach QUORUM within the MV because they target a dead node (there are two
such writes, so ~27% of all writes)
* (16/45)*(1/5)=16/225 cannot reach QUORUM at the MV (there are two such writes, so ~17% of
all writes)

There are two of each MV update (delete + insert), so we have 27%+17%=44% of writes failing
to reach quorum, and 13% of writes failing to reach anyone. Vs 20% of writes we would expect
to not reach QUORUM, and 0% of writes to fail to reach anyone.

Either way, at the very least we need to ensure we repair our MV portion from our base replicas
before we exit the JOINING state (or whatever the state is where we do not serve reads). Otherwise
QUORUM will move backwards in time once a node comes back online.

Now, I'll grant I'm very rusty on these maths, so there are no doubt errors even in my simplification,
but I think they represent the main thrust of the problem.

> Materialized Views (was: Global Indexes)
> ----------------------------------------
>                 Key: CASSANDRA-6477
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>            Assignee: Carl Yeksigian
>              Labels: cql
>             Fix For: 3.0 beta 1
>         Attachments:, users.yaml
> Local indexes are suitable for low-cardinality data, where spreading the index across
the cluster is a Good Thing.  However, for high-cardinality data, local indexes require querying
most nodes in the cluster even if only a handful of rows is returned.

This message was sent by Atlassian JIRA

View raw message