cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ivan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-3070) counter repair
Date Thu, 01 Sep 2011 00:09:12 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095012#comment-13095012
] 

ivan commented on CASSANDRA-3070:
---------------------------------

Hi Sylvain,

our sstables contain sensitive information so i can't provide them. Sorry.

I reloaded sstables in our test environment and catched a new ouput log ().
In this new log there is two new debug message:
1. rows containing "CF resolve" string (message printed at the begining of resolve method
in src/java/org/apache/cassandra/db/ColumnFamily.java)
2. rows containing "CF addAll" string (message printed at the begining of addAll method in
src/java/org/apache/cassandra/db/ColumnFamily.java)

We have a backup of sstables with these counters so I can do any test on them.
We have a 6 node cluster using RF=3.

When we experienced problems with some counters I started to debug this problem.

Using LOCAL_QUORUM CL we get the same answer from all servers but using ONE CL we get a lower
number from 2 servers of 6.
The results from the 2 server was lower with 3 than other server.

I found the following:
- server (10.20.255.55) notices when there is a digest mismatch (using LOCAL_QUORUM)
- server (10.20.255.55) sends a repair (rowmutation) message to related servers
- server (10.20.255.53) receives this mutation (which contains the same total() received by
client)
- when mutation is handled by Memtable.put() ColumnFamily.resolve() produces a different result
  (data contained in Memtable have a delta and the right counter value is not applied instead
of this deltha)

I don't know the resolved value is correct or not (I suspect it's not beacuse total() value
seems to be wrong), because I don't know in details how counter works.



Regards,
ivan


> counter repair
> --------------
>
>                 Key: CASSANDRA-3070
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3070
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.4
>            Reporter: ivan
>            Assignee: Sylvain Lebresne
>         Attachments: counter_local_quroum_maybeschedulerepairs.txt, counter_local_quroum_maybeschedulerepairs_2.txt
>
>
> Hi!
> We have some counters out of sync but repair doesn't sync values.
> We tried nodetool repair.
> We use LOCAL_QUORUM for read. A repair row mutation is sent to other nodes while reading
a bad row but counters wasn't repaired by mutation.
> Output of two nodes were uploaded. (Some new debug messages were added.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message