From user-return-62000-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org  Tue Aug 21 02:49:09 2018
Return-Path: <user-return-62000-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 3FDE3180663
	for <archive-asf-public@cust-asf.ponee.io>; Tue, 21 Aug 2018 02:49:08 +0200 (CEST)
Received: (qmail 42774 invoked by uid 500); 21 Aug 2018 00:49:06 -0000
Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:user-help@cassandra.apache.org>
List-Unsubscribe: <mailto:user-unsubscribe@cassandra.apache.org>
List-Post: <mailto:user@cassandra.apache.org>
List-Id: <user.cassandra.apache.org>
Reply-To: user@cassandra.apache.org
Delivered-To: mailing list user@cassandra.apache.org
Received: (qmail 42764 invoked by uid 99); 21 Aug 2018 00:49:06 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Aug 2018 00:49:06 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 13D6A1A0A0A
	for <user@cassandra.apache.org>; Tue, 21 Aug 2018 00:49:06 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: 1.889
X-Spam-Level: *
X-Spam-Status: No, score=1.889 tagged_above=-999 required=6.31
	tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
	HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001,
	T_DKIMWL_WL_MED=-0.01] autolearn=disabled
Authentication-Results: spamd2-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key) header.d=yahoo.com
Received: from mx1-lw-us.apache.org ([10.40.0.8])
	by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024)
	with ESMTP id 60SGQBiHUn7d for <user@cassandra.apache.org>;
	Tue, 21 Aug 2018 00:49:02 +0000 (UTC)
Received: from sonic311-23.consmr.mail.ne1.yahoo.com (sonic311-23.consmr.mail.ne1.yahoo.com [66.163.188.204])
	by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 5BEAD5F2AC
	for <user@cassandra.apache.org>; Tue, 21 Aug 2018 00:49:02 +0000 (UTC)
X-YMail-OSG: nSMng4YVM1lkk5oBaWVpHEi4apTq8Hv6lYzX2IElj8LoVXxmhT7P9Iz_TcAEnvL
 QXJQQPVMPELihOTSV6rNSo6Ptaz2o79em8NFnolFG7iB.q6D7zQUpj32x90BWC1.CXCv6pl7Wc2t
 E2NIw5xKTUUkUcStq4_a1Ko7ey5OTQhTRq.VWS2TqdNPegxw50JPF_q.ZFQycCQlNlKXczU0mpXo
 cA479mK3giUXAcAEVNoh.RMGzb6UA6FPRQ6gK0OwoT7EvjCwioLBpu4Iry9SFmnEgm3YC.Hn2G5g
 e3Fu_lO_zJU5BwbykMhNv5nk4L9WmPUXyX_FmKvifFNim8PTrAupg6SegNOa5s_HzKGNETt3jH1t
 gzH8N5_w02.V7CDvPA9GZUu1g7pph.M.FvcE6y4NAyjGH3myhW9fzrmMtH6_QekZckVN5l1f2Nrv
 hd_LK3V.8HFLSGqEr3S3_iy6b3iPmRujhqexGdu0CxI4FiWTsaztc.njGld5LNQkTjnBKiH_LDIw
 vILTHKMfzlBuKGdHFxNzTCixWY_Sdw6gAh4XmMmRDlqWIigsE5Rzxs5EkDpUkDAGqpywjr95KY_v
 EXSnpNJkF907R83c8qSI..8WCYYpe3g_StYb4rZc9gujvCruy8eTj90zWUKVfd.J6SkSNBf4z7j3
 wxozyjhpmV5BhCRFXG9H2KyosA9lzX16rfpOOD_qk1BqOx.qSxTeV38wFvdUSGeoitTS_yoZ9dzn
 Bt0P.5NkE6TmAUtc2.AmGUkDpN4dctoez.RGVCIqIJthjKAygj4Kw06ngiybm23LqW4v.m63nnl6
 vhhhdlED.iBbtJhFhG505zJBEpNfpPO.wsMcfYoR2WNJxbLvtlW1Nw4vFgCrnqyAbbBa6YQkXMdr
 aR_AslfZm8N2.BkR3McbCtDRS69En1I8tgka5pr8uOgr3oa25vzdS1s3UdMPtXQ4VxAa_NvYCdB4
 Er1MDP7Qo2JSfmQPCDHwcjPg1OGk13KVEeeQr
Received: from sonic.gate.mail.ne1.yahoo.com by sonic311.consmr.mail.ne1.yahoo.com with HTTP; Tue, 21 Aug 2018 00:48:56 +0000
Date: Tue, 21 Aug 2018 00:48:53 +0000 (UTC)
From: James Briggs <james.briggs@yahoo.com.INVALID>
Reply-To: James Briggs <james.briggs@yahoo.com>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Message-ID: <1944445006.1258786.1534812533109@mail.yahoo.com>
In-Reply-To: <CAPbVhuPBcc4rbDn1=kOA-Kq_61UEd-RXMGbMiGCOK5dKt6pNWQ@mail.gmail.com>
References: <49AA90A6-2D4D-4B0B-B13D-DE7FA4191B73@contoso.com> <B99B7E48-FB0E-4FF5-B97E-F1DDD0885989@gmail.com> <CAPbVhuPaqC4BUeaYT52PCQHioxDddsTJ076DwjhoGL=KyGCvhA@mail.gmail.com> <AC741E72-58A9-458E-BA17-CB280143B90F@webtrekk.com> <CAPbVhuOj4O_X8bAgcZWnV+_UgA9v3n-aSBQXxq0Ltxe6VyKQyA@mail.gmail.com> <CAEHpzv8CtquB3=u+ONoytWSDVoyrtA5ymQ8_oVb9CUgy1STXAg@mail.gmail.com> <CAPbVhuPBcc4rbDn1=kOA-Kq_61UEd-RXMGbMiGCOK5dKt6pNWQ@mail.gmail.com>
Subject: Re: JBOD disk failure - just say no
MIME-Version: 1.0
Content-Type: multipart/alternative; 
	boundary="----=_Part_1258785_406442162.1534812533106"
X-Mailer: WebService/1.1.12262 YahooMailNeo Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36

------=_Part_1258785_406442162.1534812533106
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Cassandra JBOD has a bunch of issues, so I don't recommend it for productio=
n:
1) disks fill up with load (data) unevenly, meaning you can run out on a di=
sk while some are half-full2) one bad disk can take out the whole node3) in=
stead of a small failure probability on an LVM/RAID volume, with JBOD you e=
nd up near 100% chance of failure after 3 years or so.4) generally you will=
 not have enough warning of a looming failure with JBOD compared to LVM/RAI=
D. (Somecompanies take a week or two to replace a failed disk.)
JBOD is easy to setup, but hard to manage.=C2=A0Thanks, James.


      From: kurt greaves <kurt@instaclustr.com>
 To: User <user@cassandra.apache.org>=20
 Sent: Friday, August 17, 2018 5:42 AM
 Subject: Re: JBOD disk failure
  =20
As far as I'm aware, yes. I recall hearing someone mention tying system tab=
les to a particular disk but at the moment that doesn't exist.
On Fri., 17 Aug. 2018, 01:04 Eric Evans, <john.eric.evans@gmail.com> wrote:

On Wed, Aug 15, 2018 at 3:23 AM kurt greaves <kurt@instaclustr.com> wrote:
> Yep. It might require a full node replace depending on what data is lost =
from the system tables. In some cases you might be able to recover from par=
tially lost system info, but it's not a sure thing.

Ugh, does it really just boil down to what part of `system` happens to
be on the disk in question?=C2=A0 In my mind, that makes the only sane
operational procedure for a failed disk to be: "replace the entire
node".=C2=A0 IOW, I don't think we can realistically claim you can survive
a failed a JBOD device if it relies on happenstance.

> On Wed., 15 Aug. 2018, 17:55 Christian Lorenz, <Christian.Lorenz@webtrekk=
.com > wrote:
>>
>> Thank you for the answers. We are using the current version 3.11.3 So th=
is one includes CASSANDRA-6696.
>>
>> So if I get this right, losing system tables will need a full node rebui=
ld. Otherwise repair will get the node consistent again.
>
> [ ... ]

--=20
Eric Evans
john.eric.evans@gmail.com

------------------------------ ------------------------------ ---------
To unsubscribe, e-mail: user-unsubscribe@cassandra. apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


  =20
------=_Part_1258785_406442162.1534812533106
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<html><head></head><body><div style=3D"color:#000; background-color:#fff; f=
ont-family:times new roman, new york, times, serif;font-size:13px"><div id=
=3D"yui_3_16_0_ym19_1_1534437152975_444903"><span id=3D"yui_3_16_0_ym19_1_1=
534437152975_445500">Cassandra JBOD has a bunch of issues, so I don't recom=
mend it for production:</span></div><div id=3D"yui_3_16_0_ym19_1_1534437152=
975_444903"><span><br></span></div><div id=3D"yui_3_16_0_ym19_1_15344371529=
75_444903" dir=3D"ltr"><span id=3D"yui_3_16_0_ym19_1_1534437152975_445268">=
1) disks fill up with load (data) unevenly, meaning you can run out on a di=
sk while some are half-full</span></div><div id=3D"yui_3_16_0_ym19_1_153443=
7152975_444903" dir=3D"ltr"><span id=3D"yui_3_16_0_ym19_1_1534437152975_445=
168">2) one bad disk can take out the whole node</span></div><div id=3D"yui=
_3_16_0_ym19_1_1534437152975_444903" dir=3D"ltr"><span id=3D"yui_3_16_0_ym1=
9_1_1534437152975_445096">3) instead of a small failure probability on an L=
VM/RAID volume, with JBOD you end up near 100% chance of failure after 3 ye=
ars or so.</span></div><div id=3D"yui_3_16_0_ym19_1_1534437152975_444903" d=
ir=3D"ltr"><span id=3D"yui_3_16_0_ym19_1_1534437152975_445271">4) generally=
 you will not have enough warning of a looming failure with JBOD compared t=
o LVM/RAID. (Some</span></div><div id=3D"yui_3_16_0_ym19_1_1534437152975_44=
4903" dir=3D"ltr"><span id=3D"yui_3_16_0_ym19_1_1534437152975_445399">compa=
nies take a week or two to replace a failed disk.)</span></div><div id=3D"y=
ui_3_16_0_ym19_1_1534437152975_444903" dir=3D"ltr"><span><br></span></div><=
div id=3D"yui_3_16_0_ym19_1_1534437152975_444903" dir=3D"ltr"><span id=3D"y=
ui_3_16_0_ym19_1_1534437152975_445498">JBOD is easy to setup, but hard to m=
anage.</span></div><div></div><div id=3D"yui_3_16_0_ym19_1_1534437152975_44=
4904">&nbsp;</div><div class=3D"signature" id=3D"yui_3_16_0_ym19_1_15344371=
52975_444905">Thanks, James.</div><div class=3D"signature" id=3D"yui_3_16_0=
_ym19_1_1534437152975_444905"><br></div><div class=3D"qtdSeparateBR"><br><b=
r></div><div class=3D"yahoo_quoted" id=3D"yui_3_16_0_ym19_1_1534437152975_4=
44910" style=3D"display: block;">  <div style=3D"font-family: times new rom=
an, new york, times, serif; font-size: 13px;" id=3D"yui_3_16_0_ym19_1_15344=
37152975_444909"> <div style=3D"font-family: HelveticaNeue, Helvetica Neue,=
 Helvetica, Arial, Lucida Grande, sans-serif; font-size: 16px;" id=3D"yui_3=
_16_0_ym19_1_1534437152975_444908"> <div dir=3D"ltr" id=3D"yui_3_16_0_ym19_=
1_1534437152975_444907"> <font size=3D"2" face=3D"Arial" id=3D"yui_3_16_0_y=
m19_1_1534437152975_444913"> <hr size=3D"1" id=3D"yui_3_16_0_ym19_1_1534437=
152975_445003"> <b><span style=3D"font-weight:bold;">From:</span></b> kurt =
greaves &lt;kurt@instaclustr.com&gt;<br> <b><span style=3D"font-weight: bol=
d;">To:</span></b> User &lt;user@cassandra.apache.org&gt; <br> <b id=3D"yui=
_3_16_0_ym19_1_1534437152975_445275"><span style=3D"font-weight: bold;" id=
=3D"yui_3_16_0_ym19_1_1534437152975_445274">Sent:</span></b> Friday, August=
 17, 2018 5:42 AM<br> <b id=3D"yui_3_16_0_ym19_1_1534437152975_445278"><spa=
n style=3D"font-weight: bold;" id=3D"yui_3_16_0_ym19_1_1534437152975_445277=
">Subject:</span></b> Re: JBOD disk failure<br> </font> </div> <div class=
=3D"y_msg_container" id=3D"yui_3_16_0_ym19_1_1534437152975_445280"><br><div=
 id=3D"yiv2169704455"><div id=3D"yui_3_16_0_ym19_1_1534437152975_445339"><d=
iv dir=3D"ltr" id=3D"yui_3_16_0_ym19_1_1534437152975_445338"><div id=3D"yui=
_3_16_0_ym19_1_1534437152975_445337">As far as I'm aware, yes. I recall hea=
ring someone mention tying system tables to a particular disk but at the mo=
ment that doesn't exist.</div><br clear=3D"none"><div class=3D"yiv216970445=
5yqt1431425413" id=3D"yiv2169704455yqt43580"><div class=3D"yiv2169704455gma=
il_quote" id=3D"yui_3_16_0_ym19_1_1534437152975_445458"><div dir=3D"ltr" id=
=3D"yui_3_16_0_ym19_1_1534437152975_445496">On Fri., 17 Aug. 2018, 01:04 Er=
ic Evans, &lt;<a rel=3D"nofollow" shape=3D"rect" ymailto=3D"mailto:john.eri=
c.evans@gmail.com" target=3D"_blank" href=3D"mailto:john.eric.evans@gmail.c=
om">john.eric.evans@gmail.com</a>&gt; wrote:<br clear=3D"none"></div><block=
quote class=3D"yiv2169704455gmail_quote" style=3D"margin:0 0 0 .8ex;border-=
left:1px #ccc solid;padding-left:1ex;" id=3D"yui_3_16_0_ym19_1_153443715297=
5_445457">On Wed, Aug 15, 2018 at 3:23 AM kurt greaves &lt;<a rel=3D"nofoll=
ow" shape=3D"rect" ymailto=3D"mailto:kurt@instaclustr.com" target=3D"_blank=
" href=3D"mailto:kurt@instaclustr.com">kurt@instaclustr.com</a>&gt; wrote:<=
br clear=3D"none">
&gt; Yep. It might require a full node replace depending on what data is lo=
st from the system tables. In some cases you might be able to recover from =
partially lost system info, but it's not a sure thing.<br clear=3D"none">
<br clear=3D"none">
Ugh, does it really just boil down to what part of `system` happens to<br c=
lear=3D"none">
be on the disk in question?&nbsp; In my mind, that makes the only sane<br c=
lear=3D"none">
operational procedure for a failed disk to be: "replace the entire<br clear=
=3D"none">
node".&nbsp; IOW, I don't think we can realistically claim you can survive<=
br clear=3D"none">
a failed a JBOD device if it relies on happenstance.<br clear=3D"none">
<br clear=3D"none">
&gt; On Wed., 15 Aug. 2018, 17:55 Christian Lorenz, &lt;<a rel=3D"nofollow"=
 shape=3D"rect" ymailto=3D"mailto:Christian.Lorenz@webtrekk.com" target=3D"=
_blank" href=3D"mailto:Christian.Lorenz@webtrekk.com">Christian.Lorenz@webt=
rekk.com</a> &gt; wrote:<br clear=3D"none">
&gt;&gt;<br clear=3D"none">
&gt;&gt; Thank you for the answers. We are using the current version 3.11.3=
 So this one includes CASSANDRA-6696.<br clear=3D"none">
&gt;&gt;<br clear=3D"none">
&gt;&gt; So if I get this right, losing system tables will need a full node=
 rebuild. Otherwise repair will get the node consistent again.<br clear=3D"=
none">
&gt;<br clear=3D"none">
&gt; [ ... ]<br clear=3D"none">
<br clear=3D"none">
-- <br clear=3D"none">
Eric Evans<br clear=3D"none">
<a rel=3D"nofollow" shape=3D"rect" ymailto=3D"mailto:john.eric.evans@gmail.=
com" target=3D"_blank" href=3D"mailto:john.eric.evans@gmail.com">john.eric.=
evans@gmail.com</a><br clear=3D"none">
<br clear=3D"none">
------------------------------ ------------------------------ ---------<br =
clear=3D"none">
To unsubscribe, e-mail: <a rel=3D"nofollow" shape=3D"rect" ymailto=3D"mailt=
o:user-unsubscribe@cassandra.apache.org" target=3D"_blank" href=3D"mailto:u=
ser-unsubscribe@cassandra.apache.org">user-unsubscribe@cassandra. apache.or=
g</a><br clear=3D"none">
For additional commands, e-mail: <a rel=3D"nofollow" shape=3D"rect" ymailto=
=3D"mailto:user-help@cassandra.apache.org" target=3D"_blank" href=3D"mailto=
:user-help@cassandra.apache.org">user-help@cassandra.apache.org</a><br clea=
r=3D"none">
<br clear=3D"none">
</blockquote></div></div>
</div></div></div><br><br></div> </div> </div>  </div></div></body></html>
------=_Part_1258785_406442162.1534812533106--