incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arya Goudarzi <gouda...@gmail.com>
Subject Lots of Deleted Rows Came back after upgrade 1.1.6 to 1.1.10
Date Sat, 16 Mar 2013 01:31:20 GMT
Hi,

I have upgraded our test cluster from 1.1.6 to 1.1.10. Followed by running
repairs. It appears that the repair task that I executed after upgrade,
brought back lots of deleted rows into life. Here are some logistics:

- The upgraded cluster started from 1.1.1 -> 1.1.2 -> 1.1.5 -> 1.1.6
- Old cluster: 4 node, C* 1.1.6 with RF3 using NetworkTopology;
- Upgrade to : 1.1.10 with all other settings the same;
- Successful repairs were being done on this cluster every night;
- Our clients use nanosecond precision timestamp for cassandra calls;
- After upgrade, while running repair I say some log messages like this in
one node:

system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:54,847
AntiEntropyService.java (line 1022) [repair
#0990f320-8da9-11e2-0000-e9b2bd8ea1bd] Endpoints /XX.194.60 and /
23.20.207.56 have 2223 range(s) out of sync for App
system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:54,877
AntiEntropyService.java (line 1022) [repair
#0990f320-8da9-11e2-0000-e9b2bd8ea1bd] Endpoints /XX.250.43 and /
23.20.207.56 have 161 range(s) out of sync for App
system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:55,097
AntiEntropyService.java (line 1022) [repair
#0990f320-8da9-11e2-0000-e9b2bd8ea1bd] Endpoints /XX.194.60 and /
23.20.250.43 have 2294 range(s) out of sync for App
system.log.5: INFO [AntiEntropyStage:1] 2013-03-15 19:55:59,190
AntiEntropyService.java (line 789) [repair
#0990f320-8da9-11e2-0000-e9b2bd8ea1bd] App is fully synced (13 remaining
column family to sync for this session)

As you see, this node thinks lots of ranges are out of sync which shouldn't
be the case as successful repairs where done every night prior to the
upgrade.

The App CF uses SizeTiered with gc_grace of 10 days. It has caching =
'ALL', and it is fairly small (11Mb on each node).

I found this bug which touches timestamp and tomstones which was fixed in
1.1.10 but am not 100% sure if it could be related to this issue:
https://issues.apache.org/jira/browse/CASSANDRA-5153

Any advice on how to dig deeper into this would be appreciated.

Thanks,
-Arya

Mime
View raw message