cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcus Eriksson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-11455) Re-executing incremental repair does not restore data on wiped node
Date Wed, 30 Mar 2016 07:39:25 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15217595#comment-15217595
] 

Marcus Eriksson commented on CASSANDRA-11455:
---------------------------------------------

I think it might be hard to give any guarantees on this - say we record that last repair was
at timestamp X, then we have 10 sstables with repairedAt=X. Now we lose one of those repaired
sstables, how do we know that during repair? If we track which sstables we expect to exist
on the node in a system table, what happens if we lose that information?

> Re-executing incremental repair does not restore data on wiped node
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-11455
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11455
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>            Reporter: Paulo Motta
>
> Reproduction steps:
> {noformat}
> ccm create test -n 3 -s
> ccm node1 stress "write n=100K cl=QUORUM -rate threads=300 -schema replication(factor=3)
compaction(strategy=org.apache.cassandra.db.compaction.LeveledCompactionStrategy,sstable_size_in_mb=1)"
> ccm flush
> ccm node1 nodetool repair keyspace1 standard1
> ccm flush
> ccm node2 stop
> rm -rf ~/.ccm/test/node2/commitlogs/*
> rm -rf ~/.ccm/test/node2/data0/keyspace1/*
> ccm node2 start
> ccm node1 nodetool repair keyspace1 standard1
> ccm node1 stress "read n=100k cl=ONE -rate threads=3"
> {noformat}
> This is log on node1 (repair coordinator):
> {noformat}
> INFO  [Thread-8] 2016-03-29 13:01:16,990 RepairRunnable.java:125 - Starting repair command
#2, repairing keyspace keyspace1 with repair options (parallelism: parallel, primary range:
false, incremental: true, job threads: 1, ColumnFamilies: [standard1], dataCenters: [], hosts:
[], # of ranges: 3)
> INFO  [Thread-8] 2016-03-29 13:01:17,021 RepairSession.java:237 - [repair #784bf8d0-f5c7-11e5-9f80-d30f63ad009f]
new session: will sync /127.0.0.1, /127.0.0.2, /127.0.0.3 on range [(3074457345618258602,-9223372036854775808],
(-9223372036854775808,-3074457345618258603], (-3074457345618258603,3074457345618258602]] for
keyspace1.[standard1]
> INFO  [Repair#2:1] 2016-03-29 13:01:17,044 RepairJob.java:100 - [repair #784bf8d0-f5c7-11e5-9f80-d30f63ad009f]
requesting merkle trees for standard1 (to [/127.0.0.2, /127.0.0.3, /127.0.0.1])
> INFO  [Repair#2:1] 2016-03-29 13:01:17,045 RepairJob.java:174 - [repair #784bf8d0-f5c7-11e5-9f80-d30f63ad009f]
Requesting merkle trees for standard1 (to [/127.0.0.2, /127.0.0.3, /127.0.0.1])
> DEBUG [AntiEntropyStage:1] 2016-03-29 13:01:17,054 RepairMessageVerbHandler.java:118
- Validating ValidationRequest{gcBefore=1458403277} org.apache.cassandra.repair.messages.ValidationRequest@56ed77cd
> DEBUG [ValidationExecutor:3] 2016-03-29 13:01:17,062 StorageService.java:3100 - Forcing
flush on keyspace keyspace1, CF standard1
> DEBUG [ValidationExecutor:3] 2016-03-29 13:01:17,066 CompactionManager.java:1290 - Created
3 merkle trees with merkle trees size 3, 0 partitions, 277 bytes
> DEBUG [ValidationExecutor:3] 2016-03-29 13:01:17,067 Validator.java:123 - Prepared AEService
trees of size 3 for [repair #784bf8d0-f5c7-11e5-9f80-d30f63ad009f on keyspace1/standard1,
[(3074457345618258602,-9223372036854775808], (-9223372036854775808,-3074457345618258603],
(-3074457345618258603,3074457345618258602]]]
> DEBUG [ValidationExecutor:3] 2016-03-29 13:01:17,067 Validator.java:233 - Validated 0
partitions for 784bf8d0-f5c7-11e5-9f80-d30f63ad009f.  Partitions per leaf are:
> DEBUG [ValidationExecutor:3] 2016-03-29 13:01:17,067 EstimatedHistogram.java:304 -  
  [0..0]: 1
> DEBUG [ValidationExecutor:3] 2016-03-29 13:01:17,067 EstimatedHistogram.java:304 -  
  [0..0]: 1
> DEBUG [ValidationExecutor:3] 2016-03-29 13:01:17,067 EstimatedHistogram.java:304 -  
  [0..0]: 1
> DEBUG [ValidationExecutor:3] 2016-03-29 13:01:17,067 Validator.java:235 - Validated 0
partitions for 784bf8d0-f5c7-11e5-9f80-d30f63ad009f.  Partition sizes are:
> INFO  [AntiEntropyStage:1] 2016-03-29 13:01:17,070 RepairSession.java:181 - [repair #784bf8d0-f5c7-11e5-9f80-d30f63ad009f]
Received merkle tree for standard1 from /127.0.0.1
> DEBUG [ValidationExecutor:3] 2016-03-29 13:01:17,070 EstimatedHistogram.java:304 -  
  [0..0]: 1
> DEBUG [ValidationExecutor:3] 2016-03-29 13:01:17,071 EstimatedHistogram.java:304 -  
  [0..0]: 1
> DEBUG [ValidationExecutor:3] 2016-03-29 13:01:17,071 EstimatedHistogram.java:304 -  
  [0..0]: 1
> DEBUG [ValidationExecutor:3] 2016-03-29 13:01:17,071 CompactionManager.java:1253 - Validation
finished in 4 msec, for [repair #784bf8d0-f5c7-11e5-9f80-d30f63ad009f on keyspace1/standard1,
[(3074457345618258602,-9223372036854775808], (-9223372036854775808,-3074457345618258603],
(-3074457345618258603,3074457345618258602]]]
> INFO  [AntiEntropyStage:1] 2016-03-29 13:01:17,077 RepairSession.java:181 - [repair #784bf8d0-f5c7-11e5-9f80-d30f63ad009f]
Received merkle tree for standard1 from /127.0.0.2
> INFO  [AntiEntropyStage:1] 2016-03-29 13:01:17,077 RepairSession.java:181 - [repair #784bf8d0-f5c7-11e5-9f80-d30f63ad009f]
Received merkle tree for standard1 from /127.0.0.3
> INFO  [RepairJobTask:1] 2016-03-29 13:01:17,078 SyncTask.java:66 - [repair #784bf8d0-f5c7-11e5-9f80-d30f63ad009f]
Endpoints /127.0.0.2 and /127.0.0.3 are consistent for standard1
> INFO  [RepairJobTask:1] 2016-03-29 13:01:17,079 SyncTask.java:66 - [repair #784bf8d0-f5c7-11e5-9f80-d30f63ad009f]
Endpoints /127.0.0.3 and /127.0.0.1 are consistent for standard1
> INFO  [RepairJobTask:3] 2016-03-29 13:01:17,079 SyncTask.java:66 - [repair #784bf8d0-f5c7-11e5-9f80-d30f63ad009f]
Endpoints /127.0.0.2 and /127.0.0.1 are consistent for standard1
> INFO  [RepairJobTask:1] 2016-03-29 13:01:17,079 RepairJob.java:145 - [repair #784bf8d0-f5c7-11e5-9f80-d30f63ad009f]
standard1 is fully synced
> INFO  [RepairJobTask:1] 2016-03-29 13:01:17,082 RepairSession.java:279 - [repair #784bf8d0-f5c7-11e5-9f80-d30f63ad009f]
Session completed successfully
> INFO  [RepairJobTask:1] 2016-03-29 13:01:17,082 RepairRunnable.java:235 - Repair session
784bf8d0-f5c7-11e5-9f80-d30f63ad009f for range [(3074457345618258602,-9223372036854775808],
(-9223372036854775808,-3074457345618258603], (-3074457345618258603,3074457345618258602]] finished
> INFO  [CompactionExecutor:4] 2016-03-29 13:01:17,087 CompactionManager.java:583 - Starting
anticompaction for keyspace1.standard1 on 0/[BigTableReader(path='/home/paulo/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3864/ma-43-big-Data.db'),
BigTableReader(path='/home/paulo/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3864/ma-42-big-Data.db'),
BigTableReader(path='/home/paulo/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3864/ma-40-big-Data.db'),
BigTableReader(path='/home/paulo/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3864/ma-38-big-Data.db'),
BigTableReader(path='/home/paulo/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3864/ma-36-big-Data.db'),
BigTableReader(path='/home/paulo/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3864/ma-34-big-Data.db'),
BigTableReader(path='/home/paulo/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3864/ma-33-big-Data.db'),
BigTableReader(path='/home/paulo/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3864/ma-32-big-Data.db'),
BigTableReader(path='/home/paulo/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3864/ma-31-big-Data.db'),
BigTableReader(path='/home/paulo/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3864/ma-29-big-Data.db'),
BigTableReader(path='/home/paulo/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3864/ma-27-big-Data.db'),
BigTableReader(path='/home/paulo/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3864/ma-25-big-Data.db'),
BigTableReader(path='/home/paulo/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3864/ma-24-big-Data.db'),
BigTableReader(path='/home/paulo/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3864/ma-23-big-Data.db'),
BigTableReader(path='/home/paulo/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3864/ma-19-big-Data.db'),
BigTableReader(path='/home/paulo/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3864/ma-16-big-Data.db'),
BigTableReader(path='/home/paulo/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3864/ma-21-big-Data.db'),
BigTableReader(path='/home/paulo/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3864/ma-22-big-Data.db'),
BigTableReader(path='/home/paulo/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3864/ma-15-big-Data.db'),
BigTableReader(path='/home/paulo/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3864/ma-20-big-Data.db'),
BigTableReader(path='/home/paulo/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3864/ma-17-big-Data.db'),
BigTableReader(path='/home/paulo/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3864/ma-18-big-Data.db')]
sstables
> INFO  [CompactionExecutor:4] 2016-03-29 13:01:17,089 CompactionManager.java:650 - Completed
anticompaction successfully
> INFO  [InternalResponseStage:12] 2016-03-29 13:01:17,098 RepairRunnable.java:312 - Repair
command #2 finished in 0 seconds
> {noformat}
> This is log on node2 (wiped node)
> {noformat}
> DEBUG [AntiEntropyStage:1] 2016-03-29 13:01:17,018 RepairMessageVerbHandler.java:61 -
Preparing, PrepareMessage{cfIds='[f5d4c580-f5c6-11e5-b93d-759a488c3864]', ranges=[(3074457345618258602,-9223372036854775808],
(-9223372036854775808,-3074457345618258603], (-3074457345618258603,3074457345618258602]],
parentRepairSession=78482840-f5c7-11e5-9f80-d30f63ad009f, isIncremental=true, timestamp=1459267277006,
isGlobal=true}
> DEBUG [AntiEntropyStage:1] 2016-03-29 13:01:17,047 RepairMessageVerbHandler.java:118
- Validating ValidationRequest{gcBefore=1458403277} org.apache.cassandra.repair.messages.ValidationRequest@56ed77cd
> DEBUG [ValidationExecutor:1] 2016-03-29 13:01:17,050 StorageService.java:3100 - Forcing
flush on keyspace keyspace1, CF standard1
> DEBUG [ValidationExecutor:1] 2016-03-29 13:01:17,066 CompactionManager.java:1290 - Created
3 merkle trees with merkle trees size 3, 0 partitions, 277 bytes
> DEBUG [ValidationExecutor:1] 2016-03-29 13:01:17,067 Validator.java:123 - Prepared AEService
trees of size 3 for [repair #784bf8d0-f5c7-11e5-9f80-d30f63ad009f on keyspace1/standard1,
[(3074457345618258602,-9223372036854775808], (-9223372036854775808,-3074457345618258603],
(-3074457345618258603,3074457345618258602]]]
> DEBUG [ValidationExecutor:1] 2016-03-29 13:01:17,069 Validator.java:233 - Validated 0
partitions for 784bf8d0-f5c7-11e5-9f80-d30f63ad009f.  Partitions per leaf are:
> INFO  [AntiEntropyStage:1] 2016-03-29 13:01:17,069 Validator.java:274 - [repair #784bf8d0-f5c7-11e5-9f80-d30f63ad009f]
Sending completed merkle tree to /127.0.0.1 for keyspace1.standard1
> DEBUG [ValidationExecutor:1] 2016-03-29 13:01:17,071 EstimatedHistogram.java:304 -  
  [0..0]: 1
> DEBUG [ValidationExecutor:1] 2016-03-29 13:01:17,071 EstimatedHistogram.java:304 -  
  [0..0]: 1
> DEBUG [ValidationExecutor:1] 2016-03-29 13:01:17,071 EstimatedHistogram.java:304 -  
  [0..0]: 1
> DEBUG [ValidationExecutor:1] 2016-03-29 13:01:17,071 Validator.java:235 - Validated 0
partitions for 784bf8d0-f5c7-11e5-9f80-d30f63ad009f.  Partition sizes are:
> DEBUG [ValidationExecutor:1] 2016-03-29 13:01:17,072 EstimatedHistogram.java:304 -  
  [0..0]: 1
> DEBUG [ValidationExecutor:1] 2016-03-29 13:01:17,072 EstimatedHistogram.java:304 -  
  [0..0]: 1
> DEBUG [ValidationExecutor:1] 2016-03-29 13:01:17,072 EstimatedHistogram.java:304 -  
  [0..0]: 1
> DEBUG [ValidationExecutor:1] 2016-03-29 13:01:17,072 CompactionManager.java:1253 - Validation
finished in 6 msec, for [repair #784bf8d0-f5c7-11e5-9f80-d30f63ad009f on keyspace1/standard1,
[(3074457345618258602,-9223372036854775808], (-9223372036854775808,-3074457345618258603],
(-3074457345618258603,3074457345618258602]]]
> DEBUG [AntiEntropyStage:1] 2016-03-29 13:01:17,084 RepairMessageVerbHandler.java:146
- Got anticompaction request AnticompactionRequest{parentRepairSession=78482840-f5c7-11e5-9f80-d30f63ad009f}
org.apache.cassandra.repair.messages.AnticompactionRequest@3efcaada
> INFO  [CompactionExecutor:2] 2016-03-29 13:01:17,085 CompactionManager.java:583 - Starting
anticompaction for keyspace1.standard1 on 0/[] sstables
> INFO  [CompactionExecutor:2] 2016-03-29 13:01:17,087 CompactionManager.java:650 - Completed
anticompaction successfully
> {noformat}
> EDIT: Running repair with {{--full}} restored data on wiped node as intended.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message