cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joshua McKenzie (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-7927) Kill daemon on any disk error
Date Mon, 13 Oct 2014 18:30:34 GMT


Joshua McKenzie commented on CASSANDRA-7927:

The previously linked branch actually had a couple of problems with it I've resolved [here|].
 Namely, the when I combined the checking for FSError / CorruptSSTableException in inspectThrowable
I didn't check the Commit log failure policy in the DatabaseDescriptor and also wouldn't have
been able to do so without augmenting the information passed in to indicate it originated
in a CommitLog context.  I think you were on the right track w/having an independent entry
point for inspection of CommitLog errors - that way we can kill the JVM on *any* commit log
errors without having to worry about the type of error thrown on the CommitLog operation.

I did a few other things on this branch as well:
# added an entry in CHANGES.txt
# added assertion to CommitLogTest to confirm the _die actually worked
# added a workaround for the fact that File.setWritable(false) on a directory fails on Windows
# merged the KillerForTests into the JVMStabilityInspector to help keep the code-base clean
# promoted the inspection in FileUtils and in CommitLog of the Throwable to the root of (handleFSError/handleCorruptSSTable/handleCommitError)
so the inspector will immediately kill if appropriate

> Kill daemon on any disk error
> -----------------------------
>                 Key: CASSANDRA-7927
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>         Environment: aws, stock cassandra or dse
>            Reporter: John Sumsion
>            Assignee: John Sumsion
>              Labels: bootcamp, lhf
>             Fix For: 2.1.1
>         Attachments: 7927-v1-die.patch
> We got a disk read error on 1.2.13 that didn't trigger the disk failure policy, and I'm
trying to hunt down why, but in doing so, I saw that there is no disk_failure_policy option
for just killing the daemon.
> If we ever get a corrupt sstable, we want to replace the node anyway, because some aws
instance store disks just go bad.
> I want to use the JVMStabilityInspector from CASSANDRA-7507 to kill so that remains standard,
so I will base my patch on CASSANDRA-7507.

This message was sent by Atlassian JIRA

View raw message