cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Cruz <andre.c...@co.sapo.pt>
Subject Repair completes successfully but data is still inconsistent
Date Fri, 14 Nov 2014 16:36:15 GMT
Hello.

So, I have detected a data inconsistency between my nodes:

(Consistency level is ONE)
[disco@Disco] get NamespaceFile2['45fc8996-41bc-429b-a382-5da9294eb59c:/XXXDIRXXX']['XXXFILEXXX'];
Value was not found
Elapsed time: 48 msec(s).
[disco@Disco] get NamespaceFile2['45fc8996-41bc-429b-a382-5da9294eb59c:/XXXDIRXXX']['XXXFILEXXX'];
=> (name=XXXFILEXXX, value=a96a7f54-49d4-11e4-8185-e0db55018fa4, timestamp=1412213845215797)
Elapsed time: 7.45 msec(s).
[disco@Disco]

I have a replication factor of 3, and if I read with a consistency level of QUORUM, the result
converges to the column being present. The strange thing is: I have deleted it on 2014-11-01
03:39:25 and my writes use QUORUM. My gc_grace_period is 10 days and if I had not been running
repairs during this period this could be explained, but the fact is that repairs have been
running every day with no sign of problems. First of all some data:

Cassandra version: 1.2.19 (we upgraded from 1.2.16 on 2014-10-22)
NamespaceFile2 compaction strategy: LeveledCompactionStrategy
CFStats:
	Read Count: 3376397
	Read Latency: 0.24978254956392865 ms.
	Write Count: 21254817
	Write Latency: 0.04713691710448507 ms.
	Pending Tasks: 0
		Column Family: NamespaceFile2
		SSTable count: 28
		SSTables in each level: [1, 10, 17, 0, 0, 0, 0, 0, 0]
		Space used (live): 4961751613
		Space used (total): 5021967044
		SSTable Compression Ratio: 0.5773464014899713
		Number of Keys (estimate): 9369856
		Memtable Columns Count: 82901
		Memtable Data Size: 19352636
		Memtable Switch Count: 283
		Read Count: 3376397
		Read Latency: NaN ms.
		Write Count: 21254817
		Write Latency: 0.049 ms.
		Pending Tasks: 0
		Bloom Filter False Positives: 21904
		Bloom Filter False Ratio: 0.00000
		Bloom Filter Space Used: 7405728
		Compacted row minimum size: 61
		Compacted row maximum size: 74975550
		Compacted row mean size: 795
		Average live cells per slice (last five minutes): 1025.0
		Average tombstones per slice (last five minutes): 0.0

This particular row has 1.1M columns, 70934390 bytes, and the offending key encoded by the
partitioner is 001045fc899641bc429ba3825da9294eb59c0000072f646f7267656d00.

First I checked which nodes were responsible for this particular row:

$ nodetool -h XXX.XXX.XXX.112 getendpoints Disco NamespaceFile2 '45fc8996-41bc-429b-a382-5da9294eb59c:/XXXDIRXXX'
XXX.XXX.XXX.17
XXX.XXX.XXX.14
XXX.XXX.XXX.18


This is the repair log of these particular nodes:

----------XXX.XXX.XXX.14----------
INFO [AntiEntropySessions:366] 2014-11-14 04:34:55,740 AntiEntropyService.java (line 651)
[repair #956244b0-6bb7-11e4-8eec-f5962c02982e] new session: will sync /XXX.XXX.XXX.14, /XXX.XXX.XXX.12,
/XXX.XXX.XXX.18 on range (14178431955039102644307275309657008810,28356863910078205288614550619314017621]
for Disco.[NamespaceFile2]
...
INFO [AntiEntropyStage:1] 2014-11-14 04:36:51,125 AntiEntropyService.java (line 764) [repair
#956244b0-6bb7-11e4-8eec-f5962c02982e] NamespaceFile2 is fully synced
INFO [AntiEntropySessions:366] 2014-11-14 04:36:51,126 AntiEntropyService.java (line 698)
[repair #956244b0-6bb7-11e4-8eec-f5962c02982e] session completed successfully
...
INFO [AntiEntropySessions:367] 2014-11-14 04:36:51,103 AntiEntropyService.java (line 651)
[repair #da2543e0-6bb7-11e4-8eec-f5962c02982e] new session: will sync /XXX.XXX.XXX.14, /XXX.XXX.XXX.11,
/XXX.XXX.XXX.17 on range (155962751505430129087380028406227096910,0] for Disco.[NamespaceFile2]
...
INFO [AntiEntropyStage:1] 2014-11-14 04:38:20,949 AntiEntropyService.java (line 764) [repair
#da2543e0-6bb7-11e4-8eec-f5962c02982e] NamespaceFile2 is fully synced
INFO [AntiEntropySessions:367] 2014-11-14 04:38:20,949 AntiEntropyService.java (line 698)
[repair #da2543e0-6bb7-11e4-8eec-f5962c02982e] session completed successfully
...
INFO [AntiEntropySessions:368] 2014-11-14 04:38:20,930 AntiEntropyService.java (line 651)
[repair #0fafc710-6bb8-11e4-8eec-f5962c02982e] new session: will sync /XXX.XXX.XXX.14, /XXX.XXX.XXX.17,
/XXX.XXX.XXX.18 on range (0,14178431955039102644307275309657008810] for Disco.[NamespaceFile2]
...
INFO [AntiEntropyStage:1] 2014-11-14 04:40:34,237 AntiEntropyService.java (line 764) [repair
#0fafc710-6bb8-11e4-8eec-f5962c02982e] NamespaceFile2 is fully synced
INFO [AntiEntropySessions:368] 2014-11-14 04:40:34,237 AntiEntropyService.java (line 698)
[repair #0fafc710-6bb8-11e4-8eec-f5962c02982e] session completed successfully


----------XXX.XXX.XXX.17----------
INFO [AntiEntropySessions:362] 2014-11-13 04:33:01,974 AntiEntropyService.java (line 651)
[repair #27293450-6aee-11e4-aabc-9b69569c95c3] new session: will sync /XXX.XXX.XXX.17, /XXX.XXX.XXX.11,
/XXX.XXX.XXX.14 on range (155962751505430129087380028406227096910,0] for Disco.[NamespaceFile2]
...
INFO [AntiEntropyStage:1] 2014-11-13 04:34:35,810 AntiEntropyService.java (line 764) [repair
#27293450-6aee-11e4-aabc-9b69569c95c3] NamespaceFile2 is fully synced
INFO [AntiEntropySessions:362] 2014-11-13 04:34:35,810 AntiEntropyService.java (line 698)
[repair #27293450-6aee-11e4-aabc-9b69569c95c3] session completed successfully
...
INFO [AntiEntropySessions:363] 2014-11-13 04:34:35,790 AntiEntropyService.java (line 651)
[repair #5f1463d0-6aee-11e4-aabc-9b69569c95c3] new session: will sync /XXX.XXX.XXX.17, /XXX.XXX.XXX.11,
/XXX.XXX.XXX.112 on range (141784319550391026443072753096570088106,155962751505430129087380028406227096910]
for Disco.[NamespaceFile2]
 ...
INFO [AntiEntropyStage:1] 2014-11-13 04:36:44,042 AntiEntropyService.java (line 764) [repair
#5f1463d0-6aee-11e4-aabc-9b69569c95c3] NamespaceFile2 is fully synced
INFO [AntiEntropySessions:363] 2014-11-13 04:36:44,042 AntiEntropyService.java (line 698)
[repair #5f1463d0-6aee-11e4-aabc-9b69569c95c3] session completed successfully
...
INFO [AntiEntropySessions:364] 2014-11-13 04:36:43,996 AntiEntropyService.java (line 651)
[repair #ab7f12b0-6aee-11e4-aabc-9b69569c95c3] new session: will sync /XXX.XXX.XXX.17, /XXX.XXX.XXX.14,
/XXX.XXX.XXX.18 on range (0,14178431955039102644307275309657008810] for Disco.[NamespaceFile2]
 ...
INFO [Thread-209997] 2014-11-13 04:38:55,122 AntiEntropyService.java (line 764) [repair #ab7f12b0-6aee-11e4-aabc-9b69569c95c3]
NamespaceFile2 is fully synced
INFO [AntiEntropySessions:364] 2014-11-13 04:38:55,122 AntiEntropyService.java (line 698)
[repair #ab7f12b0-6aee-11e4-aabc-9b69569c95c3] session completed successfully


----------XXX.XXX.XXX.18----------
INFO [AntiEntropySessions:364] 2014-11-12 04:49:00,962 AntiEntropyService.java (line 651)
[repair #38594200-6a27-11e4-b912-e1a9f145c444] new session: will sync /XXX.XXX.XXX.18, /XXX.XXX.XXX.12,
/XXX.XXX.XXX.19 on range (28356863910078205288614550619314017621,42535295865117307932921825928971026430]
for Disco.[NamespaceFile2]
...
INFO [AntiEntropyStage:1] 2014-11-12 04:50:50,095 AntiEntropyService.java (line 764) [repair
#38594200-6a27-11e4-b912-e1a9f145c444] NamespaceFile2 is fully synced
INFO [AntiEntropySessions:364] 2014-11-12 04:50:50,095 AntiEntropyService.java (line 698)
[repair #38594200-6a27-11e4-b912-e1a9f145c444] session completed successfully
...
 INFO [AntiEntropySessions:365] 2014-11-12 04:50:50,040 AntiEntropyService.java (line 651)
[repair #795d3d60-6a27-11e4-b912-e1a9f145c444] new session: will sync /XXX.XXX.XXX.18, /XXX.XXX.XXX.12,
/XXX.XXX.XXX.14 on range (14178431955039102644307275309657008810,28356863910078205288614550619314017621]
for Disco.[NamespaceFile2]
...
INFO [AntiEntropyStage:1] 2014-11-12 04:52:31,657 AntiEntropyService.java (line 764) [repair
#795d3d60-6a27-11e4-b912-e1a9f145c444] NamespaceFile2 is fully synced
INFO [AntiEntropySessions:365] 2014-11-12 04:52:31,657 AntiEntropyService.java (line 698)
[repair #795d3d60-6a27-11e4-b912-e1a9f145c444] session completed successfully
...
INFO [AntiEntropySessions:366] 2014-11-12 04:52:31,627 AntiEntropyService.java (line 651)
[repair #b5ea2f90-6a27-11e4-b912-e1a9f145c444] new session: will sync /XXX.XXX.XXX.18, /XXX.XXX.XXX.14,
/XXX.XXX.XXX.17 on range (0,14178431955039102644307275309657008810] for Disco.[NamespaceFile2]
...
INFO [Thread-202021] 2014-11-12 04:54:36,914 AntiEntropyService.java (line 764) [repair #b5ea2f90-6a27-11e4-b912-e1a9f145c444]
NamespaceFile2 is fully synced
INFO [AntiEntropySessions:366] 2014-11-12 04:54:36,915 AntiEntropyService.java (line 698)
[repair #b5ea2f90-6a27-11e4-b912-e1a9f145c444] session completed successfully


So repairs have been running. I dumped the sstables to JSON and this is what I found:

----------XXX.XXX.XXX.14----------
Disco-NamespaceFile2-ic-4710-Data.db.json - Oct 29 04:48
      [
        "XXXFILEXXX",
        "a96a7f54-49d4-11e4-8185-e0db55018fa4",
        1412213845215797
      ],

Disco-NamespaceFile2-ic-5144-Data.db.json - Nov 12 04:58
NOT FOUND

Disco-NamespaceFile2-ic-5201-Data.db.json
NOT FOUND


----------XXX.XXX.XXX.17----------
Disco-NamespaceFile2-ic-5055-Data.db.json - Oct 30 04:36
      [
        "XXXFILEXXX",
        "a96a7f54-49d4-11e4-8185-e0db55018fa4",
        1412213845215797
      ],

Disco-NamespaceFile2-ic-5472-Data.db.json
NOT FOUND

Disco-NamespaceFile2-ic-5532-Data.db.json
NOT FOUND


----------XXX.XXX.XXX.18----------
Disco-NamespaceFile2-ic-5337-Data.db.json - Nov  2 04:55
      [
        "XXXFILEXXX",
        "545455ed",
        1414813165914138,
        "d"
      ],

Disco-NamespaceFile2-ic-5719-Data.db.json - Nov 13 04:43
NOT FOUND

Disco-NamespaceFile2-ic-5748-Data.db.json
      [
        "XXXFILEXXX",
        "a96a7f54-49d4-11e4-8185-e0db55018fa4",
        1412213845215797
      ],


So it seems both XXX.XXX.XXX.14 and XXX.XXX.XXX.17 had the column and on more recent sstables
it is completely gone, suposedly the tombstone has already been reclaimed. On XXX.XXX.XXX.18
however, we can still see the tombstone on the first sstable, on the second sstable the column
is gone, and then the undeleted value magically reappears on the last sstable. Since repairs
have been running, is there another possible reason for this? On another note, why haven't
repairs propagated this zombie column to the other nodes?

Any help or pointers where to go next would be appreciated.

Best regards,
André Cruz


Mime
View raw message