cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Briggs <james.bri...@yahoo.com.INVALID>
Subject Re: JBOD disk failure - just say no
Date Tue, 21 Aug 2018 00:48:53 GMT
Cassandra JBOD has a bunch of issues, so I don't recommend it for production:
1) disks fill up with load (data) unevenly, meaning you can run out on a disk while some are
half-full2) one bad disk can take out the whole node3) instead of a small failure probability
on an LVM/RAID volume, with JBOD you end up near 100% chance of failure after 3 years or so.4)
generally you will not have enough warning of a looming failure with JBOD compared to LVM/RAID.
(Somecompanies take a week or two to replace a failed disk.)
JBOD is easy to setup, but hard to manage. Thanks, James.


      From: kurt greaves <kurt@instaclustr.com>
 To: User <user@cassandra.apache.org> 
 Sent: Friday, August 17, 2018 5:42 AM
 Subject: Re: JBOD disk failure
   
As far as I'm aware, yes. I recall hearing someone mention tying system tables to a particular
disk but at the moment that doesn't exist.
On Fri., 17 Aug. 2018, 01:04 Eric Evans, <john.eric.evans@gmail.com> wrote:

On Wed, Aug 15, 2018 at 3:23 AM kurt greaves <kurt@instaclustr.com> wrote:
> Yep. It might require a full node replace depending on what data is lost from the system
tables. In some cases you might be able to recover from partially lost system info, but it's
not a sure thing.

Ugh, does it really just boil down to what part of `system` happens to
be on the disk in question?  In my mind, that makes the only sane
operational procedure for a failed disk to be: "replace the entire
node".  IOW, I don't think we can realistically claim you can survive
a failed a JBOD device if it relies on happenstance.

> On Wed., 15 Aug. 2018, 17:55 Christian Lorenz, <Christian.Lorenz@webtrekk.com >
wrote:
>>
>> Thank you for the answers. We are using the current version 3.11.3 So this one includes
CASSANDRA-6696.
>>
>> So if I get this right, losing system tables will need a full node rebuild. Otherwise
repair will get the node consistent again.
>
> [ ... ]

-- 
Eric Evans
john.eric.evans@gmail.com

------------------------------ ------------------------------ ---------
To unsubscribe, e-mail: user-unsubscribe@cassandra. apache.org
For additional commands, e-mail: user-help@cassandra.apache.org




   
Mime
View raw message