cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yuki Morishita (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-11750) Offline scrub should not abort when it hits corruption
Date Wed, 11 May 2016 13:11:13 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280084#comment-15280084
] 

Yuki Morishita commented on CASSANDRA-11750:
--------------------------------------------

I think on trunk this is not the case anymore after CASSANDRA-11578 which decoupled disk failure
policy handling and only applied it when online.
But still there can be the case where CorruptSSTableException is handled in offline scrub
that prevent it to continue.
Let me check.

> Offline scrub should not abort when it hits corruption
> ------------------------------------------------------
>
>                 Key: CASSANDRA-11750
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11750
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Adam Hattrell
>            Priority: Minor
>              Labels: Tools
>
> Hit a failure on startup due to corruption of some sstables in system keyspace.  Deleted
the listed file and restarted - came down again with another file.
> Figured that I may as well run scrub to clean up all the files.  Got following error:
> {noformat}
> sstablescrub system compaction_history 
> ERROR 17:21:34 Exiting forcefully due to file system exception on startup, disk failure
policy "stop" 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: /cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-1936-CompressionInfo.db

> at org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:131)
~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:85)
~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:79)
~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:72)
~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:169)
~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:741) ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]

> at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:692) ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]

> at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:480) ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:376) ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:523) ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]

> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_79]

> at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_79] 
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_79]

> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_79]

> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79] 
> Caused by: java.io.EOFException: null 
> at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) ~[na:1.7.0_79]

> at java.io.DataInputStream.readUTF(DataInputStream.java:589) ~[na:1.7.0_79] 
> at java.io.DataInputStream.readUTF(DataInputStream.java:564) ~[na:1.7.0_79] 
> at org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:106)
~[cassandra-all-2.1.12.1046.jar:2.1.12.1046] 
> ... 14 common frames omitted 
> {noformat}
> I guess it might be by design - but I'd argue that I should at least have the option
to continue and let it do it's thing.  I'd prefer that sstablescrub ignored the disk failure
policy.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message