incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reverend Chip <rev.c...@gmail.com>
Subject Re: node won't leave
Date Sat, 06 Nov 2010 21:51:56 GMT
On 11/6/2010 1:48 PM, Jonathan Ellis wrote:
> On Fri, Nov 5, 2010 at 8:03 PM, Chip Salzenberg <rev.chip@gmail.com> wrote:
>> In the below "nodetool ring" output, machine 18 was told to loadbalance over
>> an hour ago.  It won't actually leave the ring.  When I first told it to
>> loadbalance, the cluster was under heavy write load; I've turned off the
>> write load, but the node won't actually leave, still.  Help?
> What version is the cluster on?

You mean, the Cassandra version?  0.7 beta3.

>   Did any of the nodes log any dropped messages?

I didn't keep timestamps of the maintenance steps, so I will be unable
to be sure which log entries correspond to which failure states.  I did
find dropped message log entries on node X.22, though.  Here's the batch
that happened more or less the time things went wrong:

 WARN [ScheduledTasks:1] 2010-11-05 17:15:03,294 MessagingService.java
(line 515) Dropped 9122 messages in the last 1000ms
 WARN [ScheduledTasks:1] 2010-11-05 17:15:05,434 MessagingService.java
(line 515) Dropped 16658 messages in the last 1000ms
 WARN [ScheduledTasks:1] 2010-11-05 17:15:07,084 MessagingService.java
(line 515) Dropped 2167 messages in the last 1000ms
 WARN [ScheduledTasks:1] 2010-11-05 17:15:09,371 MessagingService.java
(line 515) Dropped 28011 messages in the last 1000ms
 WARN [ScheduledTasks:1] 2010-11-05 17:15:11,111 MessagingService.java
(line 515) Dropped 1139 messages in the last 1000ms
 WARN [ScheduledTasks:1] 2010-11-05 17:15:13,330 MessagingService.java
(line 515) Dropped 1203 messages in the last 1000ms
 WARN [ScheduledTasks:1] 2010-11-05 17:15:15,241 MessagingService.java
(line 515) Dropped 4494 messages in the last 1000ms
 WARN [ScheduledTasks:1] 2010-11-05 17:15:16,925 MessagingService.java
(line 515) Dropped 2277 messages in the last 1000ms
 WARN [ScheduledTasks:1] 2010-11-05 17:15:18,839 MessagingService.java
(line 515) Dropped 17376 messages in the last 1000ms
 WARN [ScheduledTasks:1] 2010-11-05 17:15:23,385 MessagingService.java
(line 515) Dropped 18714 messages in the last 1000ms
 WARN [ScheduledTasks:1] 2010-11-05 17:15:25,261 MessagingService.java
(line 515) Dropped 18952 messages in the last 1000ms
 WARN [ScheduledTasks:1] 2010-11-05 17:15:29,006 MessagingService.java
(line 515) Dropped 25137 messages in the last 1000ms
 WARN [ScheduledTasks:1] 2010-11-05 17:15:30,859 MessagingService.java
(line 515) Dropped 1 messages in the last 1000ms
 WARN [ScheduledTasks:1] 2010-11-05 17:15:34,418 MessagingService.java
(line 515) Dropped 2580 messages in the last 1000ms
 WARN [ScheduledTasks:1] 2010-11-05 17:15:35,816 MessagingService.java
(line 515) Dropped 4317 messages in the last 1000ms

I looked for similar messages on node X.21 but didn't find any.

It seems that node states can become weird or wedged -- bordering on
internally inconsistent -- and cleanup operations on the order of
"shutdown the node manually and force-remove it from the ring" are
commonplace.  I hope I'm missing something.  Am I to understand that
ring maintenance requests can just fail when partially complete, in the
same manner as a regular insert might fail, perhaps due to inter-node
RPC overflow?

> Any other error or warning messages?

"Cannot provide an optimal BloomFilter" several times, and "Schema
definitions were defined both locally and in cassandra.yaml" on startup.

>> (It also collected 3.6G of load even though automatic bootstrapping is
>> disabled -- but this node had belonged to the cluster before, so maybe
>> cleaning out /var/lib/cassandra/* wasn't enough to prevent the node from
>> rejoining and taking data responsibility?)
> Assuming that contains both commitlog and data directories, that
> should do it.  You can tell by what it logs when it first starts up,
> if it's asking other nodes to send it data.

It would appear, then, that Cassandra isn't designed to be operated and
understood without constant log watching of all nodes.


Mime
View raw message