cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Schuller (JIRA)" <j...@apache.org>
Subject [jira] Created: (CASSANDRA-2015) Propagation of schema changes got out of sync with node's notion of ring
Date Thu, 20 Jan 2011 12:28:44 GMT
Propagation of schema changes got out of sync with node's notion of ring
------------------------------------------------------------------------

                 Key: CASSANDRA-2015
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2015
             Project: Cassandra
          Issue Type: Bug
    Affects Versions: 0.7.0
            Reporter: Peter Schuller


I have a test cluster of 0.7.0 of three nodes, 1, 2, 3. 1 and 2 are seeds (but not 3).

I had a situation where the following was observed:

* Schema changes submitted to node 1 would not propagate to any other node (observational
method: tail syslog and don't see any flushing of system memtables/etc except on node 1).
* Schema changes submitted to node 2 or 3 would propagate between them, or to all (not sure
which).
* Mutations submitted on node 1 *would* get propagated to node 3.
* All nodes knew of each other and considered themselves up according to 'nodetool ring'.
* Because node 3 never got schema migrations, writes submitted to node 1 that got sent to
node 3 blocked for extended periods of time on node 1, while triggering an exception on now
3 because of an invalid cfid in the row mutation.
* I can not be entirely sure whether just a regular restart would have fixed the problem.

Unfortunately, I was not aware of the problem until running some unit tests against the cluster
and I cannot say for sure which order the machines were bootstrapped in.

After initial discovery I switched to manually submitting 'create keyspace x;' via cassandra-cli
on each node (for different ks:es or interleaving create/drop), and observing results in syslog.

The observations w.r.t. row mutations did not come from the manual test, but rather from the
unit test that failed so there is some chance that there was a different mode of failure than
during my cassandra-cli tests.

Stopping all nodes and wiping data directories and restarting, fixed the problem and so far
I have not been able to trigger it again. I am not sure whether just restarting the nodes
would have fixed it.

It definitely seems like a problem to me that schema changes did not propagate even though
the node (1) node was apparently sufficiently aware of the other node (3) to sent mutations
to it, even if the original problem may have been due to some kind of operational error.

I'd be interested in hearing speculation of what likely triggers may be.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message