cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (CASSANDRA-833) fix consistencylevel during bootstrap
Date Tue, 03 May 2011 13:58:03 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis reassigned CASSANDRA-833:
----------------------------------------

    Assignee: Sylvain Lebresne

Consider the case of CL=1, RF=3 to replicas A, B, C. We begin bootstrapping node D, and write
a row K to the range being moved from C to D.

If the cluster is heavily loaded, it's possible that we write one copy to C, all the other
writes get dropped, and once bootstrap completes we lose the row. Or if we write one copy
to D, and cancel bootstrap, we again lose the row.

As said above, we want to satisfy CL for both the pre- and post-bootstrap nodes (in case bootstrap
aborts).  This requires treating the old/new range owner as a unit: both D *and* C need to
accept the write for it to count towards CL. So rather than considering {A, B, C, D} we should
consider {A, B, (C, D)}.

This is a lot of complexity to introduce. A simplification that preserves correctness is to
continue treating nodes independently but require *one more node* than normal CL. So CL=1
would actually require 2 nodes; CL=Q would require 3 (for RF=3), and so forth.  (Note that
Q(3) + 1 is the same as Q(4), which is what the existing code computes; that is one reason
I chose a CL=1 example to start with, since those are *not* the same even for the simple case
of RF=3.)

This would mean we may fail a few writes unnecessarily (a write to A or B is actually sufficient
to satisfy CL=1, but this scheme would time that out) but never allow a write to succeed that
would leave CL unsatisfied post-bootstrap (or if bootstrap is cancelled).

> fix consistencylevel during bootstrap
> -------------------------------------
>
>                 Key: CASSANDRA-833
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-833
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5
>            Reporter: Jonathan Ellis
>            Assignee: Sylvain Lebresne
>             Fix For: 0.8.1
>
>
> As originally designed, bootstrap nodes should *always* get *all* writes under any consistencylevel,
so when bootstrap finishes the operator can run cleanup on the old nodes w/o fear that he
might lose data.
> but if a bootstrap operation fails or is aborted, that means all writes will fail until
the ex-bootstrapping node is decommissioned.  so starting in CASSANDRA-722, we just ignore
dead nodes in consistencylevel calculations.
> but this breaks the original design.  CASSANDRA-822 adds a partial fix for this (just
adding bootstrap targets into the RF targets and hinting normally), but this is still broken
under certain conditions.  The real fix is to consider consistencylevel for two sets of nodes:
>   1. the RF targets as currently existing (no pending ranges)
>   2.  the RF targets as they will exist after all movement ops are done
> If we satisfy CL for both sets then we will always be in good shape.
> I'm not sure if we can easily calculate 2. from the current TokenMetadata, though.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message