cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-2118) Provide failure modes if issues with the underlying filesystem of a node
Date Mon, 28 May 2012 18:05:24 GMT


Jonathan Ellis commented on CASSANDRA-2118:

I don't think we need more than two options.  It's common for disks to become readable-not-writable,
but I've never heard of them being writable-not-readable.  Assuming that we address CASSANDRA-2116
at the right level of granularity (the disk) there are two sane options:

# Continue as best we can in the face of errors: If we can't write to a disk, log an error,
mark it bad-for-writes, and continue writing to other disks.  If we can't read from a disk,
log an error, mark it bad-for-reads-and-writes, and continue serving reads from other disks
# Since option one implies that we can blithely serve up stale data when the most recent version
was on the disk that is no longer accessible, I can see the utility of an option to halt on
error (which would allow an operator to choose to decommission + rebootstrap to minimize the
inconsistencies observed at CL.ONE)
> Provide failure modes if issues with the underlying filesystem of a node
> ------------------------------------------------------------------------
>                 Key: CASSANDRA-2118
>                 URL:
>             Project: Cassandra
>          Issue Type: Sub-task
>    Affects Versions: 0.8 beta 1
>            Reporter: Chris Goffinet
>            Assignee: Chris Goffinet
>         Attachments: 0001-Provide-failure-modes-if-issues-with-the-underlying-.patch,
0001-Provide-failure-modes-if-issues-with-the-underlying-v2.patch, 0001-Provide-failure-modes-if-issues-with-the-underlying-v3.patch
> CASSANDRA-2116 introduces the ability to detect FS errors. Let's provide a mode in cassandra.yaml
so operators can decide that in the event of failure what to do:
> 1) standard - means continue on all errors (default)
> 2) read - means only stop  gossip/rpc server if 'reads' fail from drive, writes can fail
but not kill gossip/rpc server
> 3) readwrite - means stop gossip/rpc server if any read or write errors.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message