cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joshua McKenzie (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7927) Kill daemon on any disk error
Date Thu, 09 Oct 2014 19:26:33 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165600#comment-14165600
] 

Joshua McKenzie commented on CASSANDRA-7927:
--------------------------------------------

Sorry for the delay on this -  I have a version rebased to 2.1 head [available here|https://github.com/josh-mckenzie/cassandra/compare/7927?expand=1]
* Added support for "die" policy to CommitLog exception handling
* Removed 'killMeNow' method in StabilityInspector
* Migrated the FileUtil killing logic into the StabilityInspector
* Slight refactor on JVMStabilityInspector to keep it single-point-of-entry (hand Throwable
to it, let it deal with it)
* Updated the unit tests to work w/the new structure
* Removed erroneous added entries from Config.CommitFailurePolicy
* Reverted ordering on enums in Config to just append the new entry on the end

Regarding migrating the logic into the JVMStabilityInspector: I expect we're going to have
very few exception conditions that will cause us to mark the JVM as unstable and kill it,
so I'd prefer to keep that class as simple as possible and nest that logic inside it rather
than distributing it throughout by opening the interface to a 'killMeNow' type method.  Hand
a throwable to it, it'll kill things if they need to be killed.

[~jdsumsion]: could you review the revised branch posted above?  Thanks!

> Kill daemon on any disk error
> -----------------------------
>
>                 Key: CASSANDRA-7927
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7927
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>         Environment: aws, stock cassandra or dse
>            Reporter: John Sumsion
>            Assignee: John Sumsion
>              Labels: bootcamp, lhf
>             Fix For: 2.1.1
>
>         Attachments: 7927-v1-die.patch
>
>
> We got a disk read error on 1.2.13 that didn't trigger the disk failure policy, and I'm
trying to hunt down why, but in doing so, I saw that there is no disk_failure_policy option
for just killing the daemon.
> If we ever get a corrupt sstable, we want to replace the node anyway, because some aws
instance store disks just go bad.
> I want to use the JVMStabilityInspector from CASSANDRA-7507 to kill so that remains standard,
so I will base my patch on CASSANDRA-7507.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message