incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Morton <>
Subject Re: Raid Issue on EC2 Datastax ami, 1.2.11
Date Tue, 10 Dec 2013 07:45:30 GMT
Thanks for the update Philip, other people have reported high await on a single volume previously
but I don’t think it’s been blamed on noisy neighbours. It’s interesting that you can
have noisy neighbours for IO only.

Out of interest was there much steal reported in top or iostat ? 


Aaron Morton
New Zealand

Co-Founder & Principal Consultant
Apache Cassandra Consulting

On 6/12/2013, at 4:42 am, Philippe Dupont <> wrote:

> Hi again,
> I have much more in formations on this case :
> We did further investigations on the nodes affected and did find some await problems
on one of the 4 disk in raid:
> Here was the iostat of the node :
> You can see that the write and read throughput are exactly the same on the 4 disks of
the instance. So the raid0 looks good enough. Yet, the global await, r_await and w_await are
3 to 5 times bigger on xvde disk than in other disks.
> We reported this to amazon support, and there is their answer :
> " Hello,
> I deeply apologize for any inconvenience this has been causing you and thank you for
the additional information and screenshots.
> Using the instance you based your "iostat" on ("i-xxxxxxxx"), I have looked into the
underlying hardware it is currently using and I can see it appears to have a noisy neighbor
leading to the higher "await" time on that particular device.  Since most AWS services are
multi-tenant, situations can arise where one customer's resource has the potential to impact
the performance of a different customer's resource that reside on the same underlying hardware
(a "noisy neighbor").  While these occurrences are rare, they are nonetheless inconvenient
and I am very sorry for any impact it has created.
> I have also looked into the initial instance referred to when the case was created ("i-xxxxxxx")
and cannot see any existing issues (neighboring or otherwise) as to any I/O performance impacts;
however, at the time the case was created, evidence on our end suggests there was a noisy
neighbor then as well.  Can you verify if you are still experiencing above average "await"
times on this instance?
> If you would like to mitigate the impact of encountering "noisy neighbors", you can look
into our Dedicated Instance option; Dedicated Instances launch on hardware dedicated to only
a single customer (though this can feasibly lead to a situation where a customer is their
own noisy neighbor).  However, this is an option available only to instances that are being
launched into a VPC and may require modification of the architecture of your use-case.  I
understand the instances belonging to your cluster in question have been launched into EC2-Classic,
I just wanted to bring this your attention as a possible solution.  You can read more about
Dedicated Instances here:
> Again, I am very sorry for the performance impact you have been experiencing due to having
noisy neighbors.  We understand the frustration and are always actively working to increase
capacity so the effects of noisy neighbors is lessened.  I hope this information has been
useful and if you have any additional questions whatsoever, please do not hesitate to ask!
> To conclude, the only other solution to avoid VPC and Reserved Instance is to replace
this instance by a new one, hoping to not having other "Noisy neighbors"...
> I hope that will help someone.
> Philippe
> 2013/11/28 Philippe DUPONT <>
> Hi,
> We have a Cassandra cluster of 28 nodes. Each one is an EC2 m1.xLarge based on datastax
AMI with 4 storage in raid0 mode.
> Here is the ticket we opened with amazon support :
> "This raid is created using the datastax public AMI : ami-b2212dc6. Sources are also
available here :
> As you can see in the screenshot attached (
 randomly but frequently one of the storage get fully used (100%) but 3 others are standing
in low use.
> Because of this, the node becomes slow and the whole cassandra cluster is impacted. We
are losing data due to writes fails and availability for our customers.
> it was in this state for one hour, and we decided to restart it.
> We already removed 3 other instances because of this same issue."
> (see other screenshots)
> Amazon support took a close look at the instance as well as it's underlying hardware
for any potential health issues and both seem to be healthy.
> Have someone already experienced something like this ?
> Should I contact the AMI author better?
> Thanks a lot,
> Philippe.

View raw message