Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4888C174D5 for ; Sun, 5 Apr 2015 23:40:14 +0000 (UTC) Received: (qmail 64911 invoked by uid 500); 5 Apr 2015 23:40:09 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 64859 invoked by uid 500); 5 Apr 2015 23:40:09 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 64845 invoked by uid 99); 5 Apr 2015 23:40:09 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 05 Apr 2015 23:40:09 +0000 Date: Sun, 5 Apr 2015 23:40:09 +0000 (UTC) From: "Yuki Morishita (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CASSANDRA-8669) simple_repair test failing on 2.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-8669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuki Morishita updated CASSANDRA-8669: -------------------------------------- Attachment: debug-result.txt 8669-debug.txt Here is what I found so far. I added debug logging around snapshot (attached patch 8669-debug.txt) to the latest cassandra-2.1, ran {{repair_test.py}}'s {{simple_sequential_repair_test}} and found that when snapshot happens at certain point of compaction, only tmplink file shows up as snapshot candidate. And since [we are filtering open early files|https://github.com/apache/cassandra/blob/cassandra-2.1.4/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L2223], end up with empty snapshot directory. Relevant debug log is (debug-result.txt): {code} DEBUG [CompactionExecutor:1] 2015-04-05 17:55:24,469 CompactionManager.java:226 - Checking ks.cf DEBUG [CompactionExecutor:1] 2015-04-05 17:55:24,469 SizeTieredCompactionStrategy.java:88 - Compaction buckets are [[SSTableReader(path='/tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-ka-4-Data.db'), SSTableReader(path='/tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-ka-3-Data.db'), SSTableReader(path='/tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-ka-1-Data.db'), SSTableReader(path='/tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-ka-2-Data.db')]] INFO [CompactionExecutor:1] 2015-04-05 17:55:24,472 ColumnFamilyStore.java:882 - Enqueuing flush of compactions_in_progress: 1304 (0%) on-heap, 0 (0%) off-heap DEBUG [CompactionExecutor:1] 2015-04-05 17:55:24,472 ColumnFamilyStore.java:212 - scheduling flush in 3600000 ms INFO [CompactionExecutor:1] 2015-04-05 17:55:24,477 CompactionTask.java:140 - Compacting [SSTableReader(path='/tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-ka-1-Data.db'), SSTableReader(path='/tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-ka-3-Data.db'), SSTableReader(path='/tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-ka-2-Data.db'), SSTableReader(path='/tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-ka-4-Data.db')] DEBUG [CompactionExecutor:1] 2015-04-05 17:55:24,477 ColumnFamilyStore.java:1287 - Checking for sstables overlapping [SSTableReader(path='/tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-ka-1-Data.db'), SSTableReader(path='/tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-ka-3-Data.db'), SSTableReader(path='/tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-ka-2-Data.db'), SSTableReader(path='/tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-ka-4-Data.db')] DEBUG [CompactionExecutor:1] 2015-04-05 17:55:24,477 CompactionController.java:113 - Checking droppable sstables in CFS(Keyspace='ks', ColumnFamily='cf') DEBUG [CompactionExecutor:1] 2015-04-05 17:55:24,482 CompactionTask.java:154 - Expected bloom filter size : 2001 DEBUG [MemtablePostFlush:1] 2015-04-05 17:55:24,776 ColumnFamilyStore.java:921 - forceFlush requested but everything is clean in cf DEBUG [AntiEntropyStage:1] 2015-04-05 17:55:24,776 ColumnFamilyStore.java:2231 - Snapshotting 1 sstables DEBUG [AntiEntropyStage:1] 2015-04-05 17:55:24,776 ColumnFamilyStore.java:2236 - skipped snapshotting SSTableReader(path='/tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-tmplink-ka-5-Data.db') (openReason EARLY) DEBUG [AntiEntropyStage:1] 2015-04-05 17:55:24,776 RepairMessageVerbHandler.java:94 - Enqueuing response to snapshot request d869f060-dbe6-11e4-9f47-51a3c6e97c77 to /127.0.0.1 DEBUG [CompactionExecutor:1] 2015-04-05 17:55:24,780 FileUtils.java:161 - Renaming /tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-tmp-ka-5-Statistics.db to /tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-ka-5-Statistics.db DEBUG [CompactionExecutor:1] 2015-04-05 17:55:24,780 FileUtils.java:161 - Renaming /tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-tmp-ka-5-TOC.txt to /tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-ka-5-TOC.txt DEBUG [CompactionExecutor:1] 2015-04-05 17:55:24,780 FileUtils.java:161 - Renaming /tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-tmp-ka-5-Index.db to /tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-ka-5-Index.db DEBUG [CompactionExecutor:1] 2015-04-05 17:55:24,781 FileUtils.java:161 - Renaming /tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-tmp-ka-5-Filter.db to /tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-ka-5-Filter.db DEBUG [CompactionExecutor:1] 2015-04-05 17:55:24,781 FileUtils.java:161 - Renaming /tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-tmp-ka-5-CRC.db to /tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-ka-5-CRC.db DEBUG [CompactionExecutor:1] 2015-04-05 17:55:24,781 FileUtils.java:161 - Renaming /tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-tmp-ka-5-Digest.sha1 to /tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-ka-5-Digest.sha1 DEBUG [CompactionExecutor:1] 2015-04-05 17:55:24,781 FileUtils.java:161 - Renaming /tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-tmp-ka-5-Data.db to /tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-ka-5-Data.db INFO [CompactionExecutor:1] 2015-04-05 17:55:24,787 ColumnFamilyStore.java:882 - Enqueuing flush of compactions_in_progress: 164 (0%) on-heap, 0 (0%) off-heap DEBUG [CompactionExecutor:1] 2015-04-05 17:55:24,787 ColumnFamilyStore.java:212 - scheduling flush in 3600000 ms DEBUG [CompactionExecutor:1] 2015-04-05 17:55:24,794 DataTracker.java:460 - removing /tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-ka-1 from list of files tracked for ks.cf DEBUG [CompactionExecutor:1] 2015-04-05 17:55:24,794 DataTracker.java:460 - removing /tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-ka-3 from list of files tracked for ks.cf DEBUG [CompactionExecutor:1] 2015-04-05 17:55:24,794 DataTracker.java:460 - removing /tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-ka-2 from list of files tracked for ks.cf DEBUG [CompactionExecutor:1] 2015-04-05 17:55:24,795 DataTracker.java:460 - removing /tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-ka-4 from list of files tracked for ks.cf DEBUG [CompactionExecutor:1] 2015-04-05 17:55:24,795 SSTableReader.java:1695 - Marking /tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-ka-1-Data.db compacted DEBUG [CompactionExecutor:1] 2015-04-05 17:55:24,795 SSTableReader.java:1695 - Marking /tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-ka-3-Data.db compacted DEBUG [CompactionExecutor:1] 2015-04-05 17:55:24,795 SSTableReader.java:1695 - Marking /tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-ka-2-Data.db compacted DEBUG [CompactionExecutor:1] 2015-04-05 17:55:24,796 SSTableReader.java:1695 - Marking /tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-ka-4-Data.db compacted DEBUG [CompactionExecutor:1] 2015-04-05 17:55:24,797 DataTracker.java:445 - adding /tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-ka-5 to list of files tracked for ks.cf INFO [CompactionExecutor:1] 2015-04-05 17:55:24,806 CompactionTask.java:270 - Compacted 4 sstables to [/tmp/dtest-9cJ7bi/test/node3/data/ks/cf-898085e0dbe611e4a66151a3c6e97c77/ks-cf-ka-5,]. 145,470 bytes to 144,963 (~99% of original) in 320ms = 0.432023MB/s. 2,008 total partitions merged to 2,001. Partition merge counts were {1:1997, 2:1, 3:3, } DEBUG [CompactionExecutor:1] 2015-04-05 17:55:24,806 CompactionTask.java:272 - CF Total Bytes Compacted: 161,938 DEBUG [CompactionExecutor:1] 2015-04-05 17:55:24,807 CompactionTask.java:273 - Actual #keys: 2001, Estimated #keys:2001, Err%: 0.0 {code} I tested against bisected commit (see above comment), and the result was similar, I ended up with empty snapshot direcory. Only difference was I had 1 EARLY sstable and 4 METADATA_CHANGE sstables (that are compacting). [~benedict], [~krummas], can you take a look? > simple_repair test failing on 2.1 > --------------------------------- > > Key: CASSANDRA-8669 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8669 > Project: Cassandra > Issue Type: Bug > Reporter: Philip Thompson > Assignee: Yuki Morishita > Fix For: 2.1.5 > > Attachments: 8669-debug.txt, debug-result.txt > > > The dtest simple_repair_test began failing on 12/22 on 2.1 and trunk. The test fails intermittently both locally and on cassci. > The test is here: https://github.com/riptano/cassandra-dtest/blob/master/repair_test.py#L32 > The output is here: http://cassci.datastax.com/job/cassandra-2.1_dtest/661/testReport/repair_test/TestRepair/simple_repair_test/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)