cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roland Otta <Roland.O...@willhaben.at>
Subject Re: nodes are always out of sync
Date Sat, 01 Apr 2017 20:25:01 GMT
thank you both chris and benjamin for taking time to clarify that.


On Sat, 2017-04-01 at 21:17 +0200, benjamin roth wrote:
Tl;Dr: there are race conditions in a repair and it is not trivial to fix them. So we rather
stay with these race conditions. Actually they don't really hurt. The worst case is that ranges
are repaired that don't really need a repair.

Am 01.04.2017 21:14 schrieb "Chris Lohfink" <clohfink85@gmail.com<mailto:clohfink85@gmail.com>>:
Repairs do not have an ability to instantly build a perfect view of its data between your
3 nodes at an exact time. When a piece of data is written there is a delay between when they
applied between the nodes, even if its just 500ms. So if a request to read the data and build
the merkle tree of the data occurs and it finishes on node1 at 12:01 while node2 finishes
at 12:02 the 1 minute or so delta (even if a few seconds, or if using snapshot repairs) between
the partition/range hashes in the merkle tree can be different. On a moving data set its almost
impossible to have the clusters perfectly in sync for a repair. I wouldnt worry about that
log message. If you are worried about consistency between your read/writes use each or local
quorum for both.

Chris

On Thu, Mar 30, 2017 at 1:22 AM, Roland Otta <Roland.Otta@willhaben.at<mailto:Roland.Otta@willhaben.at>>
wrote:
hi,

we see the following behaviour in our environment:

cluster consists of 6 nodes (cassandra version 3.0.7). keyspace has a
replication factor 3.
clients are writing data to the keyspace with consistency one.

we are doing parallel, incremental repairs with cassandra reaper.

even if a repair just finished and we are starting a new one
immediately, we can see the following entries in our logs:

INFO  [RepairJobTask:1] 2017-03-30 10:14:00,782 SyncTask.java:73 -
[repair #d0f651f6-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.188<http://192.168.0.188>
and /192.168.0.191<http://192.168.0.191> have 1 range(s) out of sync for ad_event_history
INFO  [RepairJobTask:2] 2017-03-30 10:14:00,782 SyncTask.java:73 -
[repair #d0f651f6-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.188<http://192.168.0.188>
and /192.168.0.189<http://192.168.0.189> have 1 range(s) out of sync for ad_event_history
INFO  [RepairJobTask:4] 2017-03-30 10:14:00,782 SyncTask.java:73 -
[repair #d0f651f6-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.189<http://192.168.0.189>
and /192.168.0.191<http://192.168.0.191> have 1 range(s) out of sync for ad_event_history
INFO  [RepairJobTask:2] 2017-03-30 10:14:03,997 SyncTask.java:73 -
[repair #d0fa70a1-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.26<http://192.168.0.26>
and /192.168.0.189<http://192.168.0.189> have 2 range(s) out of sync for ad_event_history
INFO  [RepairJobTask:1] 2017-03-30 10:14:03,997 SyncTask.java:73 -
[repair #d0fa70a1-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.26<http://192.168.0.26>
and /192.168.0.191<http://192.168.0.191> have 2 range(s) out of sync for ad_event_history
INFO  [RepairJobTask:4] 2017-03-30 10:14:03,997 SyncTask.java:73 -
[repair #d0fa70a1-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.189<http://192.168.0.189>
and /192.168.0.191<http://192.168.0.191> have 2 range(s) out of sync for ad_event_history
INFO  [RepairJobTask:1] 2017-03-30 10:14:05,375 SyncTask.java:73 -
[repair #d0fbd033-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.189<http://192.168.0.189>
and /192.168.0.191<http://192.168.0.191> have 1 range(s) out of sync for ad_event_history
INFO  [RepairJobTask:2] 2017-03-30 10:14:05,375 SyncTask.java:73 -
[repair #d0fbd033-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.189<http://192.168.0.189>
and /192.168.0.190<http://192.168.0.190> have 1 range(s) out of sync for ad_event_history
INFO  [RepairJobTask:4] 2017-03-30 10:14:05,375 SyncTask.java:73 -
[repair #d0fbd033-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.190<http://192.168.0.190>
and /192.168.0.191<http://192.168.0.191> have 1 range(s) out of sync for ad_event_history

we cant see any hints on the systems ... so we thought everything is
running smoothly with the writes.

do we have to be concerned about the nodes always being out of sync or
is this a normal behaviour in a write intensive table (as the tables
will never be 100% in sync for the latest inserts)?

bg,
roland





Mime
View raw message