incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arindam Barua <aba...@247-inc.com>
Subject Problems with node rejoining cluster
Date Tue, 25 Jun 2013 06:19:08 GMT

We need to do a rolling upgrade of our Cassandra cluster in production, since we are upgrading
Cassandra on solaris to Cassandra on CentOS.
(We went with solaris initially since most of our other hosts in production are solaris, but
were running into some lockup issues during perf tests, and decided to switch to linux)

Here are the steps we are following to take the node out of service, and get it back. Can
someone comment if we are missing anything (eg. is it recommended to specify tokens in cassandra.yaml,
or do something different with the seed hosts than mentioned below)

1.       nodetool decommission - wait for the data to be streamed out.

2.       Re-image (everything is wiped off the disks) the host to CentOS, with the same Cassandra
version

3.       Get Cassandra back up.

Other details:

-          Using Cassandra 1.1.5

-          We do not specify any tokens in cassandra.yaml relying on bootstrap assigning the
tokens automatically.

-          We are testing with a 4 node cluster, with only one seed host. The seed host is
specified in the cassandra.yaml of each node and is not changed at any point.

While testing the solaris to linux upgrade path, things seem to work smoothly. The data streams
out fine, and streams back in when the node comes back up. However, testing the linux to solaris
path (in case we need to rollback), we are facing some issues with the nodes joining back
the ring. nodetool indicates that the node has joined back the ring, but no data streams in,
the node doesn't know about the keyspaces/column families, etc. We see some errors in the
logs of the newly added nodes pasted below.

[17/06/2013:14:10:17 PDT] MutationStage:1: ERROR RowMutationVerbHandler.java (line 61) Error
in row mutation
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1020
        at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:126)
        at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:439)
        at org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:447)
        at org.apache.cassandra.db.RowMutation.fromBytes(RowMutation.java:395)
        at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:42)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

Thanks,
Arindam

Mime
View raw message