incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rudolf van der Leeden <rudolf.vanderlee...@scoreloop.com>
Subject Re: weird behavior with RAID 0 on EC2
Date Sun, 31 Mar 2013 20:49:28 GMT
I've seen the same behaviour (SLOW ephemeral disk) a few times. 
You can't do anything with a single slow disk except not using it. 
Our solution was always: Replace the m1.xlarge instance asap and everything is good.
-Rudolf.

On 31.03.2013, at 18:58, Alexis Lê-Quôc wrote:

> Alain,
> 
> Can you post your mdadm --detail /dev/md0 output here as well as your iostat -x -d when
that happens. A bad ephemeral drive on EC2 is not unheard of.
> 
> Alexis | @alq | http://datadog.com
> 
> P.S. also, disk utilization is not a reliable metric, iostat's await and svctm are more
useful imho.
> 
> 
> On Sun, Mar 31, 2013 at 6:03 AM, aaron morton <aaron@thelastpickle.com> wrote:
>> Ok, if you're going to look into it, please keep me/us posted.
> 
> It's not on my radar.
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 28/03/2013, at 2:43 PM, Alain RODRIGUEZ <arodrime@gmail.com> wrote:
> 
>> Ok, if you're going to look into it, please keep me/us posted.
>> 
>> It happen twice for me, the same day, within a few hours on the same node and only
happened to 1 node out of 12, making this node almost unreachable.
>> 
>> 
>> 2013/3/28 aaron morton <aaron@thelastpickle.com>
>> I noticed this on an m1.xlarge (cassandra 1.1.10) instance today as well, 1 or 2
disks in a raid 0 running at 85 to 100% the others 35 to 50ish. 
>> 
>> Have not looked into it. 
>> 
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 26/03/2013, at 11:57 PM, Alain RODRIGUEZ <arodrime@gmail.com> wrote:
>> 
>>> We use C* on m1.xLarge AWS EC2 servers, with 4 disks xvdb, xvdc, xvdd, xvde parts
of a logical Raid0 (md0).
>>> 
>>> I use to see their use increasing in the same way. This morning there was a normal
minor compaction followed by messages dropped on one node (out of 12).
>>> 
>>> Looking closely at this node I saw the following:
>>> 
>>> http://img69.imageshack.us/img69/9425/opscenterweirddisk.png
>>> 
>>> On this node, one of the four disks (xvdd) started working hardly while other
worked less intensively.
>>> 
>>> This is quite weird since I always saw this 4 disks being used the exact same
way at every moment (as you can see on 5 other nodes or when the node ".239" come back to
normal).
>>> 
>>> Any idea on what happened and on how it can be avoided ?
>>> 
>>> Alain
>> 
>> 
> 
> 


Mime
View raw message