cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Shuler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10479) Handling partially written sstables on node crashes
Date Fri, 09 Oct 2015 03:14:26 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949821#comment-14949821
] 

Michael Shuler commented on CASSANDRA-10479:
--------------------------------------------

An example: Consider a long ago compacted and fully valid sstable resides on a physical part
of a disk platter that develops bad sectors from age. The data contained in that sstable is
not in a current commitlog. Deleting this sstable may not be a good idea and would require
a full repair of the keyspace to recover that data, I imagine. I think there may be some very
conservative steps that could be taken, configurable to various user's pain thresholds, but
I don't think there is a one-size-fits-all solution without a human intervening and making
a decision in this case. I would not be a happy sysadmin, if my software randomly decided
to delete my data.

See [~benedict]'s comment on the related ticket [10112|https://issues.apache.org/jira/browse/CASSANDRA-10112?focusedCommentId=14700986&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14700986]

> Handling partially written sstables on node crashes
> ---------------------------------------------------
>
>                 Key: CASSANDRA-10479
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10479
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sharvanath Pathak
>
> Currently a power loss can potentially require manual intervention to bring Cassandra
back up. Essentially, these partially written SStables are considered as corrupt, and we see
the following trace quite often on hard reboots:
> {noformat}
> INFO  [SSTableBatchOpen:1] 2015-09-29 19:24:39,170 SSTableReader.java:478 - Opening /var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-13368
(79 bytes)
> ERROR [SSTableBatchOpen:1] 2015-09-29 19:24:39,177 FileUtils.java:447 - Exiting forcefully
due to file system exception on startup, disk failure policy "stop"
> org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.EOFException
>         at org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:131)
~[apache-cassandra-2.1.9.jar:2.1.9]
>         at org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85)
~[apache-cassandra-2.1.9.jar:2.1.9]
>         at org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79)
~[apache-cassandra-2.1.9.jar:2.1.9]
>         at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72)
~[apache-cassandra-2.1.9.jar:2.1.9]
>         at org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:168)
~[apache-cassandra-2.1.9.jar:2.1.9]
>         at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:752)
~[apache-cassandra-2.1.9.jar:2.1.9]
>         at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:703)
~[apache-cassandra-2.1.9.jar:2.1.9]
>         at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:491)
~[apache-cassandra-2.1.9.jar:2.1.9]
>         at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:387)
~[apache-cassandra-2.1.9.jar:2.1.9]
>         at org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:534)
~[apache-cassandra-2.1.9.jar:2.1.9]
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_80]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_80]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_80]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_80]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]
> Caused by: java.io.EOFException: null
>         at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) ~[na:1.7.0_80]
>         at java.io.DataInputStream.readUTF(DataInputStream.java:589) ~[na:1.7.0_80]
>         at java.io.DataInputStream.readUTF(DataInputStream.java:564) ~[na:1.7.0_80]
>         at org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:106)
~[apache-cassandra-2.1.9.jar:2.1.9]
>         ... 14 common frames omitted
> {noformat}
> Deleting partially written SStables might be a perfectly valid thing to do (given that
the data is present in commitlogs).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message