incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philippe Dupont <>
Subject Re: Raid Issue on EC2 Datastax ami, 1.2.11
Date Thu, 12 Dec 2013 15:21:56 GMT
Hi Aaron,

As you can see in the picture, there is not much steal on iostat. That's
the same with top.


2013/12/10 Aaron Morton <>

> Thanks for the update Philip, other people have reported high await on a
> single volume previously but I don’t think it’s been blamed on noisy
> neighbours. It’s interesting that you can have noisy neighbours for IO only.
> Out of interest was there much steal reported in top or iostat ?
> Cheers
> -----------------
> Aaron Morton
> New Zealand
> @aaronmorton
> Co-Founder & Principal Consultant
> Apache Cassandra Consulting
> On 6/12/2013, at 4:42 am, Philippe Dupont <> wrote:
> Hi again,
> I have much more in formations on this case :
> We did further investigations on the nodes affected and did find some
> await problems on one of the 4 disk in raid:
> Here was the iostat of the node :
> You can see that the write and read throughput are exactly the same on the
> 4 disks of the instance. So the raid0 looks good enough. Yet, the global
> await, r_await and w_await are 3 to 5 times bigger on xvde disk than in
> other disks.
> We reported this to amazon support, and there is their answer :
> " Hello,
> I deeply apologize for any inconvenience this has been causing you and
> thank you for the additional information and screenshots. Using the
> instance you based your "iostat" on ("i-xxxxxxxx"), I have looked into the
> underlying hardware it is currently using and I can see it appears to have
> a noisy neighbor leading to the higher "await" time on that particular
> device. Since most AWS services are multi-tenant, situations can arise
> where one customer's resource has the potential to impact the performance
> of a different customer's resource that reside on the same underlying
> hardware (a "noisy neighbor"). While these occurrences are rare, they are
> nonetheless inconvenient and I am very sorry for any impact it has created.
> I have also looked into the initial instance referred to when the case was
> created ("i-xxxxxxx") and cannot see any existing issues (neighboring or
> otherwise) as to any I/O performance impacts; however, at the time the case
> was created, evidence on our end suggests there was a noisy neighbor then
> as well. Can you verify if you are still experiencing above average "await"
> times on this instance? If you would like to mitigate the impact of
> encountering "noisy neighbors", you can look into our Dedicated Instance
> option; Dedicated Instances launch on hardware dedicated to only a single
> customer (though this can feasibly lead to a situation where a customer is
> their own noisy neighbor). However, this is an option available only to
> instances that are being launched into a VPC and may require modification
> of the architecture of your use-case. I understand the instances belonging
> to your cluster in question have been launched into EC2-Classic, I just
> wanted to bring this your attention as a possible solution. You can read
> more about Dedicated Instances here:
> Again, I am very sorry for the
> performance impact you have been experiencing due to having noisy
> neighbors. We understand the frustration and are always actively working to
> increase capacity so the effects of noisy neighbors is lessened. I hope
> this information has been useful and if you have any additional questions
> whatsoever, please do not hesitate to ask! "
> To conclude, the only other solution to avoid VPC and Reserved Instance is
> to replace this instance by a new one, hoping to not having other "Noisy
> neighbors"...
> I hope that will help someone.
> Philippe
> 2013/11/28 Philippe DUPONT <>
>> Hi,
>> We have a Cassandra cluster of 28 nodes. Each one is an EC2 m1.xLarge
>> based on datastax AMI with 4 storage in raid0 mode.
>> Here is the ticket we opened with amazon support :
>> "This raid is created using the datastax public AMI : ami-b2212dc6.
>> Sources are also available here :
>> As you can see in the screenshot attached (
>>  randomly but frequently
>> one of the storage get fully used (100%) but 3 others are standing in low
>> use.
>> Because of this, the node becomes slow and the whole cassandra cluster is
>> impacted. We are losing data due to writes fails and availability for our
>> customers.
>> it was in this state for one hour, and we decided to restart it.
>> We already removed 3 other instances because of this same issue."
>> (see other screenshots)
>> Amazon support took a close look at the instance as well as it's
>> underlying hardware for any potential health issues and both seem to be
>> healthy.
>> Have someone already experienced something like this ?
>> Should I contact the AMI author better?
>> Thanks a lot,
>> Philippe.

View raw message