Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3D86D18009 for ; Sat, 8 Aug 2015 13:36:46 +0000 (UTC) Received: (qmail 69870 invoked by uid 500); 8 Aug 2015 13:36:46 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 69829 invoked by uid 500); 8 Aug 2015 13:36:46 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 69816 invoked by uid 99); 8 Aug 2015 13:36:45 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 08 Aug 2015 13:36:45 +0000 Date: Sat, 8 Aug 2015 13:36:45 +0000 (UTC) From: "Benedict (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-9908) Potential race caused by async cleanup of transaction log files MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662981#comment-14662981 ] Benedict commented on CASSANDRA-9908: ------------------------------------- Thinking about it, we need to replicate a little more code from the regular CF drop: interruption of compactions. We should be calling both {{CompactionManager.instance.interruptCompactionForCFs}} and {{waitForCessation}}, to ensure that most references to the files are also relinquished before we signal successful drop. Then, unfortunately, that still doesn't put us in the clear, but I'm beginning to think it would be better and more robust to just accelerate SecondaryIndexes having their own UUID, which I believe is already on the cards for the near future. As we can still have other long running operations - such as {{keySamples}} which have references to sstables that can be long lived, that aren't compactions and won't acquiesce. So, really, we need to sleep-spin until they are all released. This is pretty unpleasant, and it does run the risk of schema changes becoming blocked indefinitely on a leaked reference. I would also prefer we at least logged an error message if we encounter the race condition with the NoSuchFileException. Since it definitely means we still have problems. > Potential race caused by async cleanup of transaction log files > --------------------------------------------------------------- > > Key: CASSANDRA-9908 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9908 > Project: Cassandra > Issue Type: Bug > Reporter: Sam Tunnicliffe > Assignee: Stefania > Labels: benedict-to-commit > Fix For: 3.0 beta 1 > > Attachments: TEST-org.apache.cassandra.db.SecondaryIndexTest.log > > > There seems to be a potential race in the cleanup of transaction log files, introduced in CASSANDRA-7066 > It's pretty hard to trigger on trunk, but it's possible to hit it via {{o.a.c.db.SecondaryIndexTest#testCreateIndex}} > That test creates an index, then removes it to check that the removal is correctly recorded, then adds the index again to assert that it gets rebuilt from the existing data. > The removal causes the SSTables of the index CFS to be dropped, which is a transactional operation and so writes a transaction log. When the drop is completed and the last reference to an SSTable is released, the cleanup of the transaction log is scheduled on the periodic tasks executor. The issue is that re-creating the index re-creates the index CFS. When this happens, it's possible for the cleanup of the txn log to have not yet happened. If so, the initialization of the CFS attempts to read the log to identify any orphaned temporary files. The cleanup can happen between the finding the log file and reading it's contents, which results in a {{NoSuchFileException}} > {noformat} > [junit] java.nio.file.NoSuchFileException: build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log > [junit] java.lang.RuntimeException: java.nio.file.NoSuchFileException: build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log > [junit] at org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:620) > [junit] at org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionFile.getTrackedFiles(TransactionLogs.java:190) > [junit] at org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionData.getTemporaryFiles(TransactionLogs.java:338) > [junit] at org.apache.cassandra.db.lifecycle.TransactionLogs.getTemporaryFiles(TransactionLogs.java:739) > [junit] at org.apache.cassandra.db.lifecycle.LifecycleTransaction.getTemporaryFiles(LifecycleTransaction.java:541) > [junit] at org.apache.cassandra.db.Directories$SSTableLister.getFilter(Directories.java:652) > [junit] at org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:641) > [junit] at org.apache.cassandra.db.Directories$SSTableLister.list(Directories.java:606) > [junit] at org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:351) > [junit] at org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:313) > [junit] at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:511) > [junit] at org.apache.cassandra.index.internal.CassandraIndexer.addIndexedColumn(CassandraIndexer.java:115) > [junit] at org.apache.cassandra.index.SecondaryIndexManager.addIndexedColumn(SecondaryIndexManager.java:265) > [junit] at org.apache.cassandra.db.SecondaryIndexTest.testIndexCreate(SecondaryIndexTest.java:467) > [junit] Caused by: java.nio.file.NoSuchFileException: build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log > [junit] at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) > [junit] at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > [junit] at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > [junit] at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) > [junit] at java.nio.file.Files.newByteChannel(Files.java:361) > [junit] at java.nio.file.Files.newByteChannel(Files.java:407) > [junit] at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384) > [junit] at java.nio.file.Files.newInputStream(Files.java:152) > [junit] at java.nio.file.Files.newBufferedReader(Files.java:2784) > [junit] at java.nio.file.Files.readAllLines(Files.java:3202) > [junit] at org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:616) > [junit] > [junit] > [junit] Test org.apache.cassandra.db.SecondaryIndexTest FAILED > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)