cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dikang Gu (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-11432) Counter values become under-counted when running repair.
Date Mon, 25 Apr 2016 21:56:13 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257121#comment-15257121
] 

Dikang Gu edited comment on CASSANDRA-11432 at 4/25/16 9:55 PM:
----------------------------------------------------------------

[~iamaleksey], yes, I'm trying to figure out why the repair is causing problems. What I observed:
1. repair generates thousands of smaller sstables in secs, for compaction:
SSTables in each level: [966/4, 20/10, 152/100, 33, 0, 0, 0, 0, 0] 

2. dropped messages in the log:
2016-04-25_21:35:51.21671 INFO  21:35:51 [ScheduledTasks:1]: MUTATION messages were dropped
in last 5000 ms: 0 for internal timeout and 358 for cross node timeout
2016-04-25_21:35:51.21674 INFO  21:35:51 [ScheduledTasks:1]: READ messages were dropped in
last 5000 ms: 0 for internal timeout and 90 for cross node timeout
2016-04-25_21:35:51.21674 INFO  21:35:51 [ScheduledTasks:1]: COUNTER_MUTATION messages were
dropped in last 5000 ms: 0 for internal timeout and 21 for cross node timeout
2016-04-25_21:35:51.21674 INFO  21:35:51 [ScheduledTasks:1]: Pool Name                   
Active   Pending      Completed   Blocked  All Time Blocked
2016-04-25_21:35:51.21798 INFO  21:35:51 [ScheduledTasks:1]: MutationStage               
     0         0     1009884950         0                 0
2016-04-25_21:35:51.21799
2016-04-25_21:35:51.21810 INFO  21:35:51 [ScheduledTasks:1]: ReadStage                   
     0         0      347247977         0                 0
2016-04-25_21:35:51.21811
2016-04-25_21:35:51.21828 INFO  21:35:51 [ScheduledTasks:1]: RequestResponseStage        
     0         0     1070811306         0                 0

Do you have any advises about which part of code I should look at?

Thanks!


was (Author: dikanggu):
[~iamaleksey], yes, I'm trying to figure out when the repair is causing problems. What I observed:
1. repair generates thousands of smaller sstables in secs, for compaction:
SSTables in each level: [966/4, 20/10, 152/100, 33, 0, 0, 0, 0, 0] 

2. dropped messages in the log:
2016-04-25_21:35:51.21671 INFO  21:35:51 [ScheduledTasks:1]: MUTATION messages were dropped
in last 5000 ms: 0 for internal timeout and 358 for cross node timeout
2016-04-25_21:35:51.21674 INFO  21:35:51 [ScheduledTasks:1]: READ messages were dropped in
last 5000 ms: 0 for internal timeout and 90 for cross node timeout
2016-04-25_21:35:51.21674 INFO  21:35:51 [ScheduledTasks:1]: COUNTER_MUTATION messages were
dropped in last 5000 ms: 0 for internal timeout and 21 for cross node timeout
2016-04-25_21:35:51.21674 INFO  21:35:51 [ScheduledTasks:1]: Pool Name                   
Active   Pending      Completed   Blocked  All Time Blocked
2016-04-25_21:35:51.21798 INFO  21:35:51 [ScheduledTasks:1]: MutationStage               
     0         0     1009884950         0                 0
2016-04-25_21:35:51.21799
2016-04-25_21:35:51.21810 INFO  21:35:51 [ScheduledTasks:1]: ReadStage                   
     0         0      347247977         0                 0
2016-04-25_21:35:51.21811
2016-04-25_21:35:51.21828 INFO  21:35:51 [ScheduledTasks:1]: RequestResponseStage        
     0         0     1070811306         0                 0

Do you have any advises about which part of code I should look at?

Thanks!

> Counter values become under-counted when running repair.
> --------------------------------------------------------
>
>                 Key: CASSANDRA-11432
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11432
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Dikang Gu
>            Assignee: Aleksey Yeschenko
>
> We are experimenting Counters in Cassandra 2.2.5. Our setup is that we have 6 nodes,
across three different regions, and in each region, the replication factor is 2. Basically,
each nodes holds a full copy of the data.
> We are writing to cluster with CL = 2, and reading with CL = 1. 
> When are doing 30k/s counter increment/decrement per node, and at the meanwhile, we are
double writing to our mysql tier, so that we can measure the accuracy of C* counter, compared
to mysql.
> The experiment result was great at the beginning, the counter value in C* and mysql are
very close. The difference is less than 0.1%. 
> But when we start to run the repair on one node, the counter value in C* become much
less than the value in mysql,  the difference becomes larger than 1%.
> My question is that is it a known problem that the counter value will become under-counted
if repair is running? Should we avoid running repair for counter tables?
> Thanks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message