cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Stribling <>
Subject Re: incomplete schema sync for new node
Date Sat, 02 Jul 2011 00:58:46 GMT
Oops, forgot to mention that we're using Cassandra 0.7.2.

On 07/01/2011 05:46 PM, Jeremy Stribling wrote:
> Hi all,
> I'm running into a problem with Cassandra, where a new node coming up 
> seems to only get an incomplete set of schema mutations when 
> bootstrapping, and as a result hits an "IllegalStateException: 
> replication factor (3) exceeds number of endpoints (2)" error.
> I will describe the sequence of events below as I see them, but first 
> I need to warn you that I run Cassandra in a very non-standard way.  I 
> embed it in a JVM, along with Zookeeper, and other classes for a 
> product we are working on.  We need to bring nodes up and down 
> dynamically in our product, including going from one node to three 
> nodes, and back down to one, at any time.  If we ever drop below three 
> nodes, we have code that sets the replication factor of our keyspaces 
> to 1; similarly, whenever we have three or more nodes, we change the 
> replication factor to 3.  I know this is frowned upon by the 
> community, but we're stuck with doing it this way for now.
> Ok, here's the scenario:
> 1) Node bootstraps into a cluster consisting of nodes 
> and
> 2) Once is fully bootstrapped, we change the replication 
> factor for our two keyspaces to 3.
> 3) Then node is taken down permanently, and we change the 
> replication factor back down to 1.
> 4) We then remove node's tokens using the removeToken call on 
> node
> 5) Then we start node, and have it join the cluster using 
> and as seeds.
> 6) starts receiving schema mutations to get it up to speed; 
> the last one it receives (7d51e757-a40b-11e0-a98d-65ed1eced995) has 
> the replication factor at 3.  However, there should be more schema 
> updates after this that never arrive (you can see them arrive at 
> while it is bootstrapping).
> 7) Minutes after receiving this last mutation, node hits the 
> IllegalStateException I've listed above, and I think for that reason 
> never successfully joins the cluster.
> My question is why doesn't node receive the schema updates 
> that follow 7d51e757-a40b-11e0-a98d-65ed1eced995?  (For example, 
> 8fc8820d-a40c-11e0-9eaf-6720e49624c2 is present in's log and 
> sets the replication factor back down to 1.)
> I've put logs for nodes at 
> .  The logs are 
> pretty messy because they includes log messages from both Zookeeper 
> and our product code -- sorry about that.  Also, I think the clock on 
> node is a few minutes ahead of the other nodes' clocks.
> I also noticed in's log the following exceptions:
> 2011-07-01 18:00:49,832 76315 [HintedHandoff:1] ERROR 
> org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor  - Error 
> in ThreadPoolExecutor
> java.lang.RuntimeException: java.lang.RuntimeException: Could not 
> reach schema agreement with / in 60000ms
>         at 
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask( 
>         at 
> java.util.concurrent.ThreadPoolExecutor$ 
>         at
> I don't know if that's related or not.
> Thanks in advance,
> Jeremy

View raw message