cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bram Avontuur <b...@longtailvideo.com>
Subject Re: nodetool repairs spawns many "invalid remote counter shard detected" errors on new node
Date Wed, 03 Sep 2014 20:36:14 GMT
Ok, that seems right. Doesn't look too bad, but I'll keep an eye on it.
Thanks.


On Tue, Sep 2, 2014 at 5:43 PM, DuyHai Doan <doanduyhai@gmail.com> wrote:

> Hello Bram
>
> You're probably running into this :
> https://issues.apache.org/jira/browse/CASSANDRA-4417
>
> It's marked as won't fix because it is related to the current counter
> design. Fortunately C* 2.1 will fix this.
>
> Worth reading :
> www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters
> Le 2 sept. 2014 21:24, "Bram Avontuur" <bram@longtailvideo.com> a écrit :
>
> Hi,
>>
>> Cassandra setup:
>>
>>  * 2 nodes on EC2, m1.large
>>  * Cassandra version 2.0.10
>>
>> One node died over the weekend, and I couldn't revive it. I deleted it
>> with nodetool removenode, and added a new node with a copy of the
>> cassandra.yaml config with the ip addresses changed.
>>
>> Once reconfigured and started, nodetool status listed it as part of the
>> 2-node cluster. I then ran nodetool repair on the new node to get it to
>> take replication data from a keyspace with replication factor 2. The first
>> 600-ish MB (of 14GB) synced pretty fast, but then the system.log starts
>> spawning " invalid remote counter shard detected" nodes at a rapid rate
>> (too fast to follow with tail -f). Example log line:
>>
>>  WARN [CompactionExecutor:6] 2014-09-02 19:18:49,109 CounterContext.java
>> (line 467) invalid remote counter shard detected;
>> (03afc080-2f01-11e4-948b-15a04b0b4bd9, 1, 158) and
>> (03afc080-2f01-11e4-948b-15a04b0b4bd9, 1, 79) differ only in count; will
>> pick highest to self-heal on compaction
>>
>> Transfer speed from that point on was quite slow, couple hundred MB's per
>> 10 minutes.
>>
>> After a while nodetool netstats stops listing transfers, and the warnings
>> also calm down. There's still a handful of them per minute, while the
>> cluster is not being used though.
>>
>> Any idea what could be going on here?
>>
>> Bram
>>
>

Mime
View raw message