Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E565710515 for ; Fri, 16 Oct 2015 07:30:05 +0000 (UTC) Received: (qmail 45673 invoked by uid 500); 16 Oct 2015 07:30:05 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 45634 invoked by uid 500); 16 Oct 2015 07:30:05 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 45622 invoked by uid 99); 16 Oct 2015 07:30:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Oct 2015 07:30:05 +0000 Date: Fri, 16 Oct 2015 07:30:05 +0000 (UTC) From: "Sharvanath Pathak (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-10534) CompressionInfo not being fsynced on close MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-10534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960309#comment-14960309 ] Sharvanath Pathak commented on CASSANDRA-10534: ----------------------------------------------- @benedict @stefania thanks for taking quick action on it. > CompressionInfo not being fsynced on close > ------------------------------------------ > > Key: CASSANDRA-10534 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10534 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Sharvanath Pathak > Assignee: Stefania > Fix For: 2.1.x > > > I was seeing SSTable corruption due to a CompressionInfo.db file of size 0, this happened multiple times in our testing with hard node reboots. After some investigation it seems like these file is not being fsynced, and that can potentially lead to data corruption. I am working with version 2.1.9. > I checked for fsync calls using strace, and found them happening for all but the following components: CompressionInfo, TOC.txt and digest.sha1. All of these but the CompressionInfo seem tolerable. Also a quick look through the code did not reveal any fsync calls. Moreover, I suspect the commit 4e95953f29d89a441dfe06d3f0393ed7dd8586df (https://github.com/apache/cassandra/commit/4e95953f29d89a441dfe06d3f0393ed7dd8586df#diff-b7e48a1398e39a936c11d0397d5d1966R344) has caused the regression, which removed the line > {noformat} > getChannel().force(true); > {noformat} > from CompressionMetadata.Writer.close. > Following is the trace I saw in system.log: > {noformat} > INFO [SSTableBatchOpen:1] 2015-09-29 19:24:39,170 SSTableReader.java:478 - Opening /var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-13368 (79 bytes) > ERROR [SSTableBatchOpen:1] 2015-09-29 19:24:39,177 FileUtils.java:447 - Exiting forcefully due to file system exception on startup, disk failure policy "stop" > org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.EOFException > at org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:131) ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85) ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79) ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72) ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:168) ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:752) ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:703) ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:491) ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:387) ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:534) ~[apache-cassandra-2.1.9.jar:2.1.9] > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_80] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_80] > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_80] > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_80] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] > Caused by: java.io.EOFException: null > at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) ~[na:1.7.0_80] > at java.io.DataInputStream.readUTF(DataInputStream.java:589) ~[na:1.7.0_80] > at java.io.DataInputStream.readUTF(DataInputStream.java:564) ~[na:1.7.0_80] > at org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:106) ~[apache-cassandra-2.1.9.jar:2.1.9] > ... 14 common frames omitted > {noformat} > Following is the result of ls on the data directory of a corrupted SSTable after the hard reboot: > {noformat} > $ ls -l /var/lib/cassandra/data/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/ > total 60 > -rw-r--r-- 1 cassandra cassandra 0 Oct 15 09:31 system-sstable_activity-ka-1-CompressionInfo.db > -rw-r--r-- 1 cassandra cassandra 9740 Oct 15 09:31 system-sstable_activity-ka-1-Data.db > -rw-r--r-- 1 cassandra cassandra 0 Oct 15 09:31 system-sstable_activity-ka-1-Digest.sha1 > -rw-r--r-- 1 cassandra cassandra 880 Oct 15 09:31 system-sstable_activity-ka-1-Filter.db > -rw-r--r-- 1 cassandra cassandra 34000 Oct 15 09:31 system-sstable_activity-ka-1-Index.db > -rw-r--r-- 1 cassandra cassandra 7338 Oct 15 09:31 system-sstable_activity-ka-1-Statistics.db > -rw-r--r-- 1 cassandra cassandra 0 Oct 15 09:31 system-sstable_activity-ka-1-TOC.txt > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)