Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <4FDE09C8-F538-4A70-ACAD-5B2FA268AEB3@thelastpickle.com>
References: 
 <CA+VSrLo6q106tPfRDrcwngfXi0HNKOCJ7WaMroiPp2J_Um_PGw@mail.gmail.com>
 <E1CD08A0-AABC-4D47-AA14-00CDD31B0679@thelastpickle.com>
 <CA+VSrLozxOTETTTq3=4ftbBZvVDwTAoQ2COLNfaDT4FcP+Scsw@mail.gmail.com>
 <4FDE09C8-F538-4A70-ACAD-5B2FA268AEB3@thelastpickle.com>
From: =?UTF-8?B?QWxleGlzIEzDqi1RdcO0Yw==?= <alq@datadoghq.com>
Date: Sun, 31 Mar 2013 12:58:34 -0400
Message-ID: 
 <CAAGz8TPQG6wZ2+RBp2bmc937qVRZ1Cw0_N-mRJJqhZ0aKNsBPw@mail.gmail.com>
Subject: Re: weird behavior with RAID 0 on EC2
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=f46d0447f3f85cbe0f04d93b6bc1

--f46d0447f3f85cbe0f04d93b6bc1
Content-Type: text/plain; charset=UTF-8

Alain,

Can you post your mdadm --detail /dev/md0 output here as well as your
iostat -x -d when that happens. A bad ephemeral drive on EC2 is not unheard
of.

Alexis | @alq | http://datadog.com

P.S. also, disk utilization is not a reliable metric, iostat's await and
svctm are more useful imho.


On Sun, Mar 31, 2013 at 6:03 AM, aaron morton <aaron@thelastpickle.com>wrote:

> Ok, if you're going to look into it, please keep me/us posted.
>
> It's not on my radar.
>
> Cheers
>
>    -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 28/03/2013, at 2:43 PM, Alain RODRIGUEZ <arodrime@gmail.com> wrote:
>
> Ok, if you're going to look into it, please keep me/us posted.
>
> It happen twice for me, the same day, within a few hours on the same node
> and only happened to 1 node out of 12, making this node almost unreachable.
>
>
> 2013/3/28 aaron morton <aaron@thelastpickle.com>
>
>> I noticed this on an m1.xlarge (cassandra 1.1.10) instance today as well,
>> 1 or 2 disks in a raid 0 running at 85 to 100% the others 35 to 50ish.
>>
>> Have not looked into it.
>>
>> Cheers
>>
>>    -----------------
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 26/03/2013, at 11:57 PM, Alain RODRIGUEZ <arodrime@gmail.com> wrote:
>>
>> We use C* on m1.xLarge AWS EC2 servers, with 4 disks xvdb, xvdc, xvdd,
>> xvde parts of a logical Raid0 (md0).
>>
>> I use to see their use increasing in the same way. This morning there was
>> a normal minor compaction followed by messages dropped on one node (out of
>> 12).
>>
>> Looking closely at this node I saw the following:
>>
>> http://img69.imageshack.us/img69/9425/opscenterweirddisk.png
>>
>> On this node, one of the four disks (xvdd) started working hardly while
>> other worked less intensively.
>>
>> This is quite weird since I always saw this 4 disks being used the exact
>> same way at every moment (as you can see on 5 other nodes or when the node
>> ".239" come back to normal).
>>
>> Any idea on what happened and on how it can be avoided ?
>>
>> Alain
>>
>>
>>
>
>

--f46d0447f3f85cbe0f04d93b6bc1
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Alain,<div><br></div><div style>Can you post your mdadm --=
detail /dev/md0 output here as well as your iostat -x -d when that happens.=
 A bad ephemeral drive on EC2 is not unheard of.</div><div style><br></div>

<div style>Alexis | @alq | <a href=3D"http://datadog.com">http://datadog.co=
m</a></div><div style><br></div><div style>P.S. also, disk utilization is n=
ot a reliable metric, iostat&#39;s await and svctm are more useful imho.</d=
iv>

</div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On Sun,=
 Mar 31, 2013 at 6:03 AM, aaron morton <span dir=3D"ltr">&lt;<a href=3D"mai=
lto:aaron@thelastpickle.com" target=3D"_blank">aaron@thelastpickle.com</a>&=
gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div style=3D"word-wrap:break-word"><div cla=
ss=3D"im"><blockquote type=3D"cite"><div dir=3D"ltr">Ok, if you&#39;re goin=
g to look into it, please keep me/us posted.</div>

</blockquote></div><div><div dir=3D"ltr">It&#39;s not on my radar.</div></d=
iv><div class=3D"im"><div dir=3D"ltr"><br></div><div dir=3D"ltr">Cheers</di=
v><div dir=3D"ltr"><br></div><div>
<div style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;tex=
t-align:-webkit-auto;font-style:normal;font-weight:normal;line-height:norma=
l;text-transform:none;font-size:medium;white-space:normal;font-family:Helve=
tica;word-wrap:break-word;word-spacing:0px">

<div style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;tex=
t-align:-webkit-auto;font-style:normal;font-weight:normal;line-height:norma=
l;text-transform:none;font-size:medium;white-space:normal;font-family:Helve=
tica;word-wrap:break-word;word-spacing:0px">

<span style=3D"border-spacing:0px;text-indent:0px;letter-spacing:normal;fon=
t-variant:normal;text-align:-webkit-auto;font-style:normal;font-weight:norm=
al;line-height:normal;border-collapse:separate;text-transform:none;font-siz=
e:medium;white-space:normal;font-family:Helvetica;word-spacing:0px"><div st=
yle=3D"word-wrap:break-word">

<span style=3D"border-spacing:0px;text-indent:0px;letter-spacing:normal;fon=
t-variant:normal;font-style:normal;font-weight:normal;line-height:normal;bo=
rder-collapse:separate;text-transform:none;font-size:medium;white-space:nor=
mal;font-family:Helvetica;word-spacing:0px"><div style=3D"word-wrap:break-w=
ord">

<span style=3D"border-spacing:0px;text-indent:0px;letter-spacing:normal;fon=
t-variant:normal;font-style:normal;font-weight:normal;line-height:normal;bo=
rder-collapse:separate;text-transform:none;font-size:medium;white-space:nor=
mal;font-family:Helvetica;word-spacing:0px"><div style=3D"word-wrap:break-w=
ord">

<span style=3D"border-spacing:0px;text-indent:0px;letter-spacing:normal;fon=
t-variant:normal;font-style:normal;font-weight:normal;line-height:normal;bo=
rder-collapse:separate;text-transform:none;font-size:medium;white-space:nor=
mal;font-family:Helvetica;word-spacing:0px"><div style=3D"word-wrap:break-w=
ord">

<div>-----------------</div><div>Aaron Morton</div><div>Freelance Cassandra=
 Consultant</div><div>New Zealand</div><div><br></div><div>@aaronmorton</di=
v><div><a href=3D"http://www.thelastpickle.com" target=3D"_blank">http://ww=
w.thelastpickle.com</a></div>

</div></span></div></span></div></span></div></span></div></div>
</div>

<br></div><div><div class=3D"h5"><div><div>On 28/03/2013, at 2:43 PM, Alain=
 RODRIGUEZ &lt;<a href=3D"mailto:arodrime@gmail.com" target=3D"_blank">arod=
rime@gmail.com</a>&gt; wrote:</div><br><blockquote type=3D"cite"><div dir=
=3D"ltr">

Ok, if you&#39;re going to look into it, please keep me/us posted.<div><br>=
</div><div>It happen twice for me, the same day, within a few hours on the =
same node and only happened to 1 node out of 12, making this node almost un=
reachable.</div>


</div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">2013/3/=
28 aaron morton <span dir=3D"ltr">&lt;<a href=3D"mailto:aaron@thelastpickle=
.com" target=3D"_blank">aaron@thelastpickle.com</a>&gt;</span><br><blockquo=
te class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc so=
lid;padding-left:1ex">


<div style=3D"word-wrap:break-word">I noticed this on an m1.xlarge (cassand=
ra 1.1.10) instance today as well, 1 or 2 disks in a raid 0 running at 85 t=
o 100% the others 35 to 50ish.=C2=A0<div><br></div><div>Have not looked int=
o it.=C2=A0</div>


<div><br></div><div>Cheers</div><div><br></div><div><div>
<div style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;tex=
t-align:-webkit-auto;font-style:normal;font-weight:normal;line-height:norma=
l;text-transform:none;font-size:medium;white-space:normal;font-family:Helve=
tica;word-wrap:break-word;word-spacing:0px">


<div style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;tex=
t-align:-webkit-auto;font-style:normal;font-weight:normal;line-height:norma=
l;text-transform:none;font-size:medium;white-space:normal;font-family:Helve=
tica;word-wrap:break-word;word-spacing:0px">


<span style=3D"border-spacing:0px;text-indent:0px;letter-spacing:normal;fon=
t-variant:normal;text-align:-webkit-auto;font-style:normal;font-weight:norm=
al;line-height:normal;border-collapse:separate;text-transform:none;font-siz=
e:medium;white-space:normal;font-family:Helvetica;word-spacing:0px"><div st=
yle=3D"word-wrap:break-word">


<span style=3D"border-spacing:0px;text-indent:0px;letter-spacing:normal;fon=
t-variant:normal;font-style:normal;font-weight:normal;line-height:normal;bo=
rder-collapse:separate;text-transform:none;font-size:medium;white-space:nor=
mal;font-family:Helvetica;word-spacing:0px"><div style=3D"word-wrap:break-w=
ord">


<span style=3D"border-spacing:0px;text-indent:0px;letter-spacing:normal;fon=
t-variant:normal;font-style:normal;font-weight:normal;line-height:normal;bo=
rder-collapse:separate;text-transform:none;font-size:medium;white-space:nor=
mal;font-family:Helvetica;word-spacing:0px"><div style=3D"word-wrap:break-w=
ord">


<span style=3D"border-spacing:0px;text-indent:0px;letter-spacing:normal;fon=
t-variant:normal;font-style:normal;font-weight:normal;line-height:normal;bo=
rder-collapse:separate;text-transform:none;font-size:medium;white-space:nor=
mal;font-family:Helvetica;word-spacing:0px"><div style=3D"word-wrap:break-w=
ord">


<div>-----------------</div><div>Aaron Morton</div><div>Freelance Cassandra=
 Consultant</div><div>New Zealand</div><div><br></div><div>@aaronmorton</di=
v><div><a href=3D"http://www.thelastpickle.com/" target=3D"_blank">http://w=
ww.thelastpickle.com</a></div>


</div></span></div></span></div></span></div></span></div></div>
</div><div><div>


<br><div><div>On 26/03/2013, at 11:57 PM, Alain RODRIGUEZ &lt;<a href=3D"ma=
ilto:arodrime@gmail.com" target=3D"_blank">arodrime@gmail.com</a>&gt; wrote=
:</div><br><blockquote type=3D"cite"><div dir=3D"ltr">We use C* on m1.xLarg=
e AWS EC2 servers, with 4 disks xvdb, xvdc, xvdd, xvde parts of a logical R=
aid0 (md0).<br>


<div><br></div><div>I use to see their use increasing in the same way. This=
 morning there was a normal minor compaction followed by messages dropped o=
n one node (out of 12).</div>

<div><br></div><div>Looking closely at this node I saw the following:</div>=
<div><br></div><div><a href=3D"http://img69.imageshack.us/img69/9425/opscen=
terweirddisk.png" target=3D"_blank">http://img69.imageshack.us/img69/9425/o=
pscenterweirddisk.png</a><br>


</div><div><br></div><div>On this node, one of the four disks (xvdd) starte=
d working hardly while other worked less intensively.</div><div><br></div><=
div>This is quite weird since I always saw this 4 disks being used the exac=
t same way at every moment (as you can see on 5 other nodes or when=C2=A0th=
e node &quot;.239&quot;=C2=A0come back to normal).</div>


<div><br></div><div>Any idea on what happened and on how it can be avoided =
?</div><div><br></div><div>Alain</div></div>
</blockquote></div><br></div></div></div></div></blockquote></div><br></div=
>
</blockquote></div><br></div></div></div></blockquote></div><br></div>

--f46d0447f3f85cbe0f04d93b6bc1--