From user-return-62000-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Tue Aug 21 02:49:09 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 3FDE3180663 for ; Tue, 21 Aug 2018 02:49:08 +0200 (CEST) Received: (qmail 42774 invoked by uid 500); 21 Aug 2018 00:49:06 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 42764 invoked by uid 99); 21 Aug 2018 00:49:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Aug 2018 00:49:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 13D6A1A0A0A for ; Tue, 21 Aug 2018 00:49:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.889 X-Spam-Level: * X-Spam-Status: No, score=1.889 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=yahoo.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 60SGQBiHUn7d for ; Tue, 21 Aug 2018 00:49:02 +0000 (UTC) Received: from sonic311-23.consmr.mail.ne1.yahoo.com (sonic311-23.consmr.mail.ne1.yahoo.com [66.163.188.204]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 5BEAD5F2AC for ; Tue, 21 Aug 2018 00:49:02 +0000 (UTC) X-YMail-OSG: nSMng4YVM1lkk5oBaWVpHEi4apTq8Hv6lYzX2IElj8LoVXxmhT7P9Iz_TcAEnvL QXJQQPVMPELihOTSV6rNSo6Ptaz2o79em8NFnolFG7iB.q6D7zQUpj32x90BWC1.CXCv6pl7Wc2t E2NIw5xKTUUkUcStq4_a1Ko7ey5OTQhTRq.VWS2TqdNPegxw50JPF_q.ZFQycCQlNlKXczU0mpXo cA479mK3giUXAcAEVNoh.RMGzb6UA6FPRQ6gK0OwoT7EvjCwioLBpu4Iry9SFmnEgm3YC.Hn2G5g e3Fu_lO_zJU5BwbykMhNv5nk4L9WmPUXyX_FmKvifFNim8PTrAupg6SegNOa5s_HzKGNETt3jH1t gzH8N5_w02.V7CDvPA9GZUu1g7pph.M.FvcE6y4NAyjGH3myhW9fzrmMtH6_QekZckVN5l1f2Nrv hd_LK3V.8HFLSGqEr3S3_iy6b3iPmRujhqexGdu0CxI4FiWTsaztc.njGld5LNQkTjnBKiH_LDIw vILTHKMfzlBuKGdHFxNzTCixWY_Sdw6gAh4XmMmRDlqWIigsE5Rzxs5EkDpUkDAGqpywjr95KY_v EXSnpNJkF907R83c8qSI..8WCYYpe3g_StYb4rZc9gujvCruy8eTj90zWUKVfd.J6SkSNBf4z7j3 wxozyjhpmV5BhCRFXG9H2KyosA9lzX16rfpOOD_qk1BqOx.qSxTeV38wFvdUSGeoitTS_yoZ9dzn Bt0P.5NkE6TmAUtc2.AmGUkDpN4dctoez.RGVCIqIJthjKAygj4Kw06ngiybm23LqW4v.m63nnl6 vhhhdlED.iBbtJhFhG505zJBEpNfpPO.wsMcfYoR2WNJxbLvtlW1Nw4vFgCrnqyAbbBa6YQkXMdr aR_AslfZm8N2.BkR3McbCtDRS69En1I8tgka5pr8uOgr3oa25vzdS1s3UdMPtXQ4VxAa_NvYCdB4 Er1MDP7Qo2JSfmQPCDHwcjPg1OGk13KVEeeQr Received: from sonic.gate.mail.ne1.yahoo.com by sonic311.consmr.mail.ne1.yahoo.com with HTTP; Tue, 21 Aug 2018 00:48:56 +0000 Date: Tue, 21 Aug 2018 00:48:53 +0000 (UTC) From: James Briggs Reply-To: James Briggs To: "user@cassandra.apache.org" Message-ID: <1944445006.1258786.1534812533109@mail.yahoo.com> In-Reply-To: References: <49AA90A6-2D4D-4B0B-B13D-DE7FA4191B73@contoso.com> Subject: Re: JBOD disk failure - just say no MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_1258785_406442162.1534812533106" X-Mailer: WebService/1.1.12262 YahooMailNeo Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36 ------=_Part_1258785_406442162.1534812533106 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cassandra JBOD has a bunch of issues, so I don't recommend it for productio= n: 1) disks fill up with load (data) unevenly, meaning you can run out on a di= sk while some are half-full2) one bad disk can take out the whole node3) in= stead of a small failure probability on an LVM/RAID volume, with JBOD you e= nd up near 100% chance of failure after 3 years or so.4) generally you will= not have enough warning of a looming failure with JBOD compared to LVM/RAI= D. (Somecompanies take a week or two to replace a failed disk.) JBOD is easy to setup, but hard to manage.=C2=A0Thanks, James. From: kurt greaves To: User =20 Sent: Friday, August 17, 2018 5:42 AM Subject: Re: JBOD disk failure =20 As far as I'm aware, yes. I recall hearing someone mention tying system tab= les to a particular disk but at the moment that doesn't exist. On Fri., 17 Aug. 2018, 01:04 Eric Evans, wrote: On Wed, Aug 15, 2018 at 3:23 AM kurt greaves wrote: > Yep. It might require a full node replace depending on what data is lost = from the system tables. In some cases you might be able to recover from par= tially lost system info, but it's not a sure thing. Ugh, does it really just boil down to what part of `system` happens to be on the disk in question?=C2=A0 In my mind, that makes the only sane operational procedure for a failed disk to be: "replace the entire node".=C2=A0 IOW, I don't think we can realistically claim you can survive a failed a JBOD device if it relies on happenstance. > On Wed., 15 Aug. 2018, 17:55 Christian Lorenz, wrote: >> >> Thank you for the answers. We are using the current version 3.11.3 So th= is one includes CASSANDRA-6696. >> >> So if I get this right, losing system tables will need a full node rebui= ld. Otherwise repair will get the node consistent again. > > [ ... ] --=20 Eric Evans john.eric.evans@gmail.com ------------------------------ ------------------------------ --------- To unsubscribe, e-mail: user-unsubscribe@cassandra. apache.org For additional commands, e-mail: user-help@cassandra.apache.org =20 ------=_Part_1258785_406442162.1534812533106 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Cassandra JBOD has a bunch of issues, so I don't recom= mend it for production:

= 1) disks fill up with load (data) unevenly, meaning you can run out on a di= sk while some are half-full
2) one bad disk can take out the whole node
3) instead of a small failure probability on an L= VM/RAID volume, with JBOD you end up near 100% chance of failure after 3 ye= ars or so.
4) generally= you will not have enough warning of a looming failure with JBOD compared t= o LVM/RAID. (Some
compa= nies take a week or two to replace a failed disk.)

<= div id=3D"yui_3_16_0_ym19_1_1534437152975_444903" dir=3D"ltr">JBOD is easy to setup, but hard to m= anage.
 
Thanks, James.



From: kurt = greaves <kurt@instaclustr.com>
To: User <user@cassandra.apache.org>
Sent: Friday, August= 17, 2018 5:42 AM
Subject: Re: JBOD disk failure

As far as I'm aware, yes. I recall hea= ring someone mention tying system tables to a particular disk but at the mo= ment that doesn't exist.

On Fri., 17 Aug. 2018, 01:04 Er= ic Evans, <john.eric.evans@gmail.com> wrote:
On Wed, Aug 15, 2018 at 3:23 AM kurt greaves <kurt@instaclustr.com> wrote:<= br clear=3D"none"> > Yep. It might require a full node replace depending on what data is lo= st from the system tables. In some cases you might be able to recover from = partially lost system info, but it's not a sure thing.

Ugh, does it really just boil down to what part of `system` happens to
be on the disk in question?  In my mind, that makes the only sane
operational procedure for a failed disk to be: "replace the entire
node".  IOW, I don't think we can realistically claim you can survive<= br clear=3D"none"> a failed a JBOD device if it relies on happenstance.

> On Wed., 15 Aug. 2018, 17:55 Christian Lorenz, <Christian.Lorenz@webt= rekk.com > wrote:
>>
>> Thank you for the answers. We are using the current version 3.11.3= So this one includes CASSANDRA-6696.
>>
>> So if I get this right, losing system tables will need a full node= rebuild. Otherwise repair will get the node consistent again.
>
> [ ... ]

--
Eric Evans
john.eric.= evans@gmail.com

------------------------------ ------------------------------ ---------
To unsubscribe, e-mail: user-unsubscribe@cassandra. apache.or= g
For additional commands, e-mail: user-help@cassandra.apache.org



------=_Part_1258785_406442162.1534812533106--