Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 129E7194FD for ; Wed, 30 Mar 2016 07:39:26 +0000 (UTC) Received: (qmail 99417 invoked by uid 500); 30 Mar 2016 07:39:25 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 99380 invoked by uid 500); 30 Mar 2016 07:39:25 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 99369 invoked by uid 99); 30 Mar 2016 07:39:25 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Mar 2016 07:39:25 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 918C82C1F5C for ; Wed, 30 Mar 2016 07:39:25 +0000 (UTC) Date: Wed, 30 Mar 2016 07:39:25 +0000 (UTC) From: "Marcus Eriksson (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-11455) Re-executing incremental repair does not restore data on wiped node MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-11455?page=3Dcom.atla= ssian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId= =3D15217595#comment-15217595 ]=20 Marcus Eriksson commented on CASSANDRA-11455: --------------------------------------------- I think it might be hard to give any guarantees on this - say we record tha= t last repair was at timestamp X, then we have 10 sstables with repairedAt= =3DX. Now we lose one of those repaired sstables, how do we know that durin= g repair? If we track which sstables we expect to exist on the node in a sy= stem table, what happens if we lose that information? > Re-executing incremental repair does not restore data on wiped node > ------------------------------------------------------------------- > > Key: CASSANDRA-11455 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1145= 5 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Reporter: Paulo Motta > > Reproduction steps: > {noformat} > ccm create test -n 3 -s > ccm node1 stress "write n=3D100K cl=3DQUORUM -rate threads=3D300 -schema = replication(factor=3D3) compaction(strategy=3Dorg.apache.cassandra.db.compa= ction.LeveledCompactionStrategy,sstable_size_in_mb=3D1)" > ccm flush > ccm node1 nodetool repair keyspace1 standard1 > ccm flush > ccm node2 stop > rm -rf ~/.ccm/test/node2/commitlogs/* > rm -rf ~/.ccm/test/node2/data0/keyspace1/* > ccm node2 start > ccm node1 nodetool repair keyspace1 standard1 > ccm node1 stress "read n=3D100k cl=3DONE -rate threads=3D3" > {noformat} > This is log on node1 (repair coordinator): > {noformat} > INFO [Thread-8] 2016-03-29 13:01:16,990 RepairRunnable.java:125 - Starti= ng repair command #2, repairing keyspace keyspace1 with repair options (par= allelism: parallel, primary range: false, incremental: true, job threads: 1= , ColumnFamilies: [standard1], dataCenters: [], hosts: [], # of ranges: 3) > INFO [Thread-8] 2016-03-29 13:01:17,021 RepairSession.java:237 - [repair= #784bf8d0-f5c7-11e5-9f80-d30f63ad009f] new session: will sync /127.0.0.1, = /127.0.0.2, /127.0.0.3 on range [(3074457345618258602,-9223372036854775808]= , (-9223372036854775808,-3074457345618258603], (-3074457345618258603,307445= 7345618258602]] for keyspace1.[standard1] > INFO [Repair#2:1] 2016-03-29 13:01:17,044 RepairJob.java:100 - [repair #= 784bf8d0-f5c7-11e5-9f80-d30f63ad009f] requesting merkle trees for standard1= (to [/127.0.0.2, /127.0.0.3, /127.0.0.1]) > INFO [Repair#2:1] 2016-03-29 13:01:17,045 RepairJob.java:174 - [repair #= 784bf8d0-f5c7-11e5-9f80-d30f63ad009f] Requesting merkle trees for standard1= (to [/127.0.0.2, /127.0.0.3, /127.0.0.1]) > DEBUG [AntiEntropyStage:1] 2016-03-29 13:01:17,054 RepairMessageVerbHandl= er.java:118 - Validating ValidationRequest{gcBefore=3D1458403277} org.apach= e.cassandra.repair.messages.ValidationRequest@56ed77cd > DEBUG [ValidationExecutor:3] 2016-03-29 13:01:17,062 StorageService.java:= 3100 - Forcing flush on keyspace keyspace1, CF standard1 > DEBUG [ValidationExecutor:3] 2016-03-29 13:01:17,066 CompactionManager.ja= va:1290 - Created 3 merkle trees with merkle trees size 3, 0 partitions, 27= 7 bytes > DEBUG [ValidationExecutor:3] 2016-03-29 13:01:17,067 Validator.java:123 -= Prepared AEService trees of size 3 for [repair #784bf8d0-f5c7-11e5-9f80-d3= 0f63ad009f on keyspace1/standard1, [(3074457345618258602,-92233720368547758= 08], (-9223372036854775808,-3074457345618258603], (-3074457345618258603,307= 4457345618258602]]] > DEBUG [ValidationExecutor:3] 2016-03-29 13:01:17,067 Validator.java:233 -= Validated 0 partitions for 784bf8d0-f5c7-11e5-9f80-d30f63ad009f. Partitio= ns per leaf are: > DEBUG [ValidationExecutor:3] 2016-03-29 13:01:17,067 EstimatedHistogram.j= ava:304 - [0..0]: 1 > DEBUG [ValidationExecutor:3] 2016-03-29 13:01:17,067 EstimatedHistogram.j= ava:304 - [0..0]: 1 > DEBUG [ValidationExecutor:3] 2016-03-29 13:01:17,067 EstimatedHistogram.j= ava:304 - [0..0]: 1 > DEBUG [ValidationExecutor:3] 2016-03-29 13:01:17,067 Validator.java:235 -= Validated 0 partitions for 784bf8d0-f5c7-11e5-9f80-d30f63ad009f. Partitio= n sizes are: > INFO [AntiEntropyStage:1] 2016-03-29 13:01:17,070 RepairSession.java:181= - [repair #784bf8d0-f5c7-11e5-9f80-d30f63ad009f] Received merkle tree for = standard1 from /127.0.0.1 > DEBUG [ValidationExecutor:3] 2016-03-29 13:01:17,070 EstimatedHistogram.j= ava:304 - [0..0]: 1 > DEBUG [ValidationExecutor:3] 2016-03-29 13:01:17,071 EstimatedHistogram.j= ava:304 - [0..0]: 1 > DEBUG [ValidationExecutor:3] 2016-03-29 13:01:17,071 EstimatedHistogram.j= ava:304 - [0..0]: 1 > DEBUG [ValidationExecutor:3] 2016-03-29 13:01:17,071 CompactionManager.ja= va:1253 - Validation finished in 4 msec, for [repair #784bf8d0-f5c7-11e5-9f= 80-d30f63ad009f on keyspace1/standard1, [(3074457345618258602,-922337203685= 4775808], (-9223372036854775808,-3074457345618258603], (-307445734561825860= 3,3074457345618258602]]] > INFO [AntiEntropyStage:1] 2016-03-29 13:01:17,077 RepairSession.java:181= - [repair #784bf8d0-f5c7-11e5-9f80-d30f63ad009f] Received merkle tree for = standard1 from /127.0.0.2 > INFO [AntiEntropyStage:1] 2016-03-29 13:01:17,077 RepairSession.java:181= - [repair #784bf8d0-f5c7-11e5-9f80-d30f63ad009f] Received merkle tree for = standard1 from /127.0.0.3 > INFO [RepairJobTask:1] 2016-03-29 13:01:17,078 SyncTask.java:66 - [repai= r #784bf8d0-f5c7-11e5-9f80-d30f63ad009f] Endpoints /127.0.0.2 and /127.0.0.= 3 are consistent for standard1 > INFO [RepairJobTask:1] 2016-03-29 13:01:17,079 SyncTask.java:66 - [repai= r #784bf8d0-f5c7-11e5-9f80-d30f63ad009f] Endpoints /127.0.0.3 and /127.0.0.= 1 are consistent for standard1 > INFO [RepairJobTask:3] 2016-03-29 13:01:17,079 SyncTask.java:66 - [repai= r #784bf8d0-f5c7-11e5-9f80-d30f63ad009f] Endpoints /127.0.0.2 and /127.0.0.= 1 are consistent for standard1 > INFO [RepairJobTask:1] 2016-03-29 13:01:17,079 RepairJob.java:145 - [rep= air #784bf8d0-f5c7-11e5-9f80-d30f63ad009f] standard1 is fully synced > INFO [RepairJobTask:1] 2016-03-29 13:01:17,082 RepairSession.java:279 - = [repair #784bf8d0-f5c7-11e5-9f80-d30f63ad009f] Session completed successful= ly > INFO [RepairJobTask:1] 2016-03-29 13:01:17,082 RepairRunnable.java:235 -= Repair session 784bf8d0-f5c7-11e5-9f80-d30f63ad009f for range [(3074457345= 618258602,-9223372036854775808], (-9223372036854775808,-3074457345618258603= ], (-3074457345618258603,3074457345618258602]] finished > INFO [CompactionExecutor:4] 2016-03-29 13:01:17,087 CompactionManager.ja= va:583 - Starting anticompaction for keyspace1.standard1 on 0/[BigTableRead= er(path=3D'/home/paulo/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5= c611e5b93d759a488c3864/ma-43-big-Data.db'), BigTableReader(path=3D'/home/pa= ulo/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3= 864/ma-42-big-Data.db'), BigTableReader(path=3D'/home/paulo/.ccm/test/node1= /data0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3864/ma-40-big-Data.= db'), BigTableReader(path=3D'/home/paulo/.ccm/test/node1/data0/keyspace1/st= andard1-f5d4c580f5c611e5b93d759a488c3864/ma-38-big-Data.db'), BigTableReade= r(path=3D'/home/paulo/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c= 611e5b93d759a488c3864/ma-36-big-Data.db'), BigTableReader(path=3D'/home/pau= lo/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c38= 64/ma-34-big-Data.db'), BigTableReader(path=3D'/home/paulo/.ccm/test/node1/= data0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3864/ma-33-big-Data.d= b'), BigTableReader(path=3D'/home/paulo/.ccm/test/node1/data0/keyspace1/sta= ndard1-f5d4c580f5c611e5b93d759a488c3864/ma-32-big-Data.db'), BigTableReader= (path=3D'/home/paulo/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c6= 11e5b93d759a488c3864/ma-31-big-Data.db'), BigTableReader(path=3D'/home/paul= o/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c386= 4/ma-29-big-Data.db'), BigTableReader(path=3D'/home/paulo/.ccm/test/node1/d= ata0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3864/ma-27-big-Data.db= '), BigTableReader(path=3D'/home/paulo/.ccm/test/node1/data0/keyspace1/stan= dard1-f5d4c580f5c611e5b93d759a488c3864/ma-25-big-Data.db'), BigTableReader(= path=3D'/home/paulo/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c61= 1e5b93d759a488c3864/ma-24-big-Data.db'), BigTableReader(path=3D'/home/paulo= /.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3864= /ma-23-big-Data.db'), BigTableReader(path=3D'/home/paulo/.ccm/test/node1/da= ta0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3864/ma-19-big-Data.db'= ), BigTableReader(path=3D'/home/paulo/.ccm/test/node1/data0/keyspace1/stand= ard1-f5d4c580f5c611e5b93d759a488c3864/ma-16-big-Data.db'), BigTableReader(p= ath=3D'/home/paulo/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c611= e5b93d759a488c3864/ma-21-big-Data.db'), BigTableReader(path=3D'/home/paulo/= .ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3864/= ma-22-big-Data.db'), BigTableReader(path=3D'/home/paulo/.ccm/test/node1/dat= a0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3864/ma-15-big-Data.db')= , BigTableReader(path=3D'/home/paulo/.ccm/test/node1/data0/keyspace1/standa= rd1-f5d4c580f5c611e5b93d759a488c3864/ma-20-big-Data.db'), BigTableReader(pa= th=3D'/home/paulo/.ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c611e= 5b93d759a488c3864/ma-17-big-Data.db'), BigTableReader(path=3D'/home/paulo/.= ccm/test/node1/data0/keyspace1/standard1-f5d4c580f5c611e5b93d759a488c3864/m= a-18-big-Data.db')] sstables > INFO [CompactionExecutor:4] 2016-03-29 13:01:17,089 CompactionManager.ja= va:650 - Completed anticompaction successfully > INFO [InternalResponseStage:12] 2016-03-29 13:01:17,098 RepairRunnable.j= ava:312 - Repair command #2 finished in 0 seconds > {noformat} > This is log on node2 (wiped node) > {noformat} > DEBUG [AntiEntropyStage:1] 2016-03-29 13:01:17,018 RepairMessageVerbHandl= er.java:61 - Preparing, PrepareMessage{cfIds=3D'[f5d4c580-f5c6-11e5-b93d-75= 9a488c3864]', ranges=3D[(3074457345618258602,-9223372036854775808], (-92233= 72036854775808,-3074457345618258603], (-3074457345618258603,307445734561825= 8602]], parentRepairSession=3D78482840-f5c7-11e5-9f80-d30f63ad009f, isIncre= mental=3Dtrue, timestamp=3D1459267277006, isGlobal=3Dtrue} > DEBUG [AntiEntropyStage:1] 2016-03-29 13:01:17,047 RepairMessageVerbHandl= er.java:118 - Validating ValidationRequest{gcBefore=3D1458403277} org.apach= e.cassandra.repair.messages.ValidationRequest@56ed77cd > DEBUG [ValidationExecutor:1] 2016-03-29 13:01:17,050 StorageService.java:= 3100 - Forcing flush on keyspace keyspace1, CF standard1 > DEBUG [ValidationExecutor:1] 2016-03-29 13:01:17,066 CompactionManager.ja= va:1290 - Created 3 merkle trees with merkle trees size 3, 0 partitions, 27= 7 bytes > DEBUG [ValidationExecutor:1] 2016-03-29 13:01:17,067 Validator.java:123 -= Prepared AEService trees of size 3 for [repair #784bf8d0-f5c7-11e5-9f80-d3= 0f63ad009f on keyspace1/standard1, [(3074457345618258602,-92233720368547758= 08], (-9223372036854775808,-3074457345618258603], (-3074457345618258603,307= 4457345618258602]]] > DEBUG [ValidationExecutor:1] 2016-03-29 13:01:17,069 Validator.java:233 -= Validated 0 partitions for 784bf8d0-f5c7-11e5-9f80-d30f63ad009f. Partitio= ns per leaf are: > INFO [AntiEntropyStage:1] 2016-03-29 13:01:17,069 Validator.java:274 - [= repair #784bf8d0-f5c7-11e5-9f80-d30f63ad009f] Sending completed merkle tree= to /127.0.0.1 for keyspace1.standard1 > DEBUG [ValidationExecutor:1] 2016-03-29 13:01:17,071 EstimatedHistogram.j= ava:304 - [0..0]: 1 > DEBUG [ValidationExecutor:1] 2016-03-29 13:01:17,071 EstimatedHistogram.j= ava:304 - [0..0]: 1 > DEBUG [ValidationExecutor:1] 2016-03-29 13:01:17,071 EstimatedHistogram.j= ava:304 - [0..0]: 1 > DEBUG [ValidationExecutor:1] 2016-03-29 13:01:17,071 Validator.java:235 -= Validated 0 partitions for 784bf8d0-f5c7-11e5-9f80-d30f63ad009f. Partitio= n sizes are: > DEBUG [ValidationExecutor:1] 2016-03-29 13:01:17,072 EstimatedHistogram.j= ava:304 - [0..0]: 1 > DEBUG [ValidationExecutor:1] 2016-03-29 13:01:17,072 EstimatedHistogram.j= ava:304 - [0..0]: 1 > DEBUG [ValidationExecutor:1] 2016-03-29 13:01:17,072 EstimatedHistogram.j= ava:304 - [0..0]: 1 > DEBUG [ValidationExecutor:1] 2016-03-29 13:01:17,072 CompactionManager.ja= va:1253 - Validation finished in 6 msec, for [repair #784bf8d0-f5c7-11e5-9f= 80-d30f63ad009f on keyspace1/standard1, [(3074457345618258602,-922337203685= 4775808], (-9223372036854775808,-3074457345618258603], (-307445734561825860= 3,3074457345618258602]]] > DEBUG [AntiEntropyStage:1] 2016-03-29 13:01:17,084 RepairMessageVerbHandl= er.java:146 - Got anticompaction request AnticompactionRequest{parentRepair= Session=3D78482840-f5c7-11e5-9f80-d30f63ad009f} org.apache.cassandra.repair= .messages.AnticompactionRequest@3efcaada > INFO [CompactionExecutor:2] 2016-03-29 13:01:17,085 CompactionManager.ja= va:583 - Starting anticompaction for keyspace1.standard1 on 0/[] sstables > INFO [CompactionExecutor:2] 2016-03-29 13:01:17,087 CompactionManager.ja= va:650 - Completed anticompaction successfully > {noformat} > EDIT: Running repair with {{--full}} restored data on wiped node as inten= ded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)