cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joshua McKenzie (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7927) Kill daemon on any disk error
Date Thu, 16 Oct 2014 19:20:34 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174120#comment-14174120
] 

Joshua McKenzie commented on CASSANDRA-7927:
--------------------------------------------

Another updated pushed to branch [here|https://github.com/josh-mckenzie/cassandra/compare/7927];

While working on CASSANDRA-7579 I noticed that the _die unit test was failing on linux (for
entirely different reasons than the Windows failure). Digging into it a bit shows that the
unit test it was based on, testCommitFailurePolicy_stop(), didn't actually do what it was
intended to do. StorageService isn't initialized by SchemaLoader so the assertions to check
on _stop test always passed. Also, changing a directory to write-only doesn't change the contents
to being write-only so flushes would keep working even if the StorageService had been started.

I've opened the interface on CommitLog.handleCommitError as public, marked it VisibleForTesting,
and updated those 2 unit tests to check the logic specifically dealing with how our CommitLog
system deals with throwables during stop and die policy settings.  Tests pass on both Windows
and linux now.

> Kill daemon on any disk error
> -----------------------------
>
>                 Key: CASSANDRA-7927
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7927
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>         Environment: aws, stock cassandra or dse
>            Reporter: John Sumsion
>            Assignee: John Sumsion
>              Labels: bootcamp, lhf
>             Fix For: 2.1.1
>
>         Attachments: 7927-v1-die.patch
>
>
> We got a disk read error on 1.2.13 that didn't trigger the disk failure policy, and I'm
trying to hunt down why, but in doing so, I saw that there is no disk_failure_policy option
for just killing the daemon.
> If we ever get a corrupt sstable, we want to replace the node anyway, because some aws
instance store disks just go bad.
> I want to use the JVMStabilityInspector from CASSANDRA-7507 to kill so that remains standard,
so I will base my patch on CASSANDRA-7507.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message