cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Schuller (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CASSANDRA-2015) Propagation of schema changes got out of sync with node's notion of ring
Date Thu, 20 Jan 2011 14:00:50 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12984194#action_12984194
] 

Peter Schuller commented on CASSANDRA-2015:
-------------------------------------------

I now believe this was triggered by concurrent migrations (due to a test doing drop/create
quickly, and using a hector client pointed to a cluster instead of a single machine).

I have not confirmed, but I am definitely able to trigger a similar symptom by submitting
'create keyspace conflict;' more or less concurrently on two nodes at the same time using
cassandra-cli. One of the loads is permanently no longer getting schema migrations, even after
restart.

So, until I know otherwise I'll consider this operator error, and since it is known that concurrent
migrations are not supported/allowed, it is not a bug.

(Diagnosis mechanisms may be improved, and recovery instructions written, but that is a separate
concern.)


> Propagation of schema changes got out of sync with node's notion of ring
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2015
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2015
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Peter Schuller
>
> I have a test cluster of 0.7.0 of three nodes, 1, 2, 3. 1 and 2 are seeds (but not 3).
> I had a situation where the following was observed:
> * Schema changes submitted to node 1 would not propagate to any other node (observational
method: tail syslog and don't see any flushing of system memtables/etc except on node 1).
> * Schema changes submitted to node 2 or 3 would propagate between them, or to all (not
sure which).
> * Mutations submitted on node 1 *would* get propagated to node 3.
> * All nodes knew of each other and considered themselves up according to 'nodetool ring'.
> * Because node 3 never got schema migrations, writes submitted to node 1 that got sent
to node 3 blocked for extended periods of time on node 1, while triggering an exception on
now 3 because of an invalid cfid in the row mutation.
> * I can not be entirely sure whether just a regular restart would have fixed the problem.
> Unfortunately, I was not aware of the problem until running some unit tests against the
cluster and I cannot say for sure which order the machines were bootstrapped in.
> After initial discovery I switched to manually submitting 'create keyspace x;' via cassandra-cli
on each node (for different ks:es or interleaving create/drop), and observing results in syslog.
> The observations w.r.t. row mutations did not come from the manual test, but rather from
the unit test that failed so there is some chance that there was a different mode of failure
than during my cassandra-cli tests.
> Stopping all nodes and wiping data directories and restarting, fixed the problem and
so far I have not been able to trigger it again. I am not sure whether just restarting the
nodes would have fixed it.
> It definitely seems like a problem to me that schema changes did not propagate even though
the node (1) node was apparently sufficiently aware of the other node (3) to sent mutations
to it, even if the original problem may have been due to some kind of operational error.
> I'd be interested in hearing speculation of what likely triggers may be.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message