incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Bromhead <...@instaclustr.com>
Subject Re: EBS SSD <-> Cassandra ?
Date Fri, 20 Jun 2014 05:49:51 GMT
Irrespective of performance and latency numbers there are fundamental flaws with using EBS/NAS
and Cassandra, particularly around bandwidth contention and what happens when the shared storage
medium breaks. Also obligatory reference to http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html.

Regarding ENI

AWS are pretty explicit about it’s impact on bandwidth:

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html
Attaching another network interface to an instance is not a method to increase or double the
network bandwidth to or from the dual-homed instance.

So Nate you are right in that it is a function of logical separation helps for some reason.

 

Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359

On 20 Jun 2014, at 8:17 am, Nate McCall <nate@thelastpickle.com> wrote:

> Sorry - should have been clear I was speaking in terms of route optimizing, not bandwidth.
No idea as to the implementation (probably instance specific) and I doubt it actually doubles
bandwidth. 
> 
> Specifically: having an ENI dedicated to API traffic did smooth out some recent load
tests we did for a client. It could be that overall throughput increases where more a function
of cleaner traffic segmentation/smoother routing. We werent being terribly scientific - was
more an artifact of testing network segmentation. 
> 
> I'm just going to say that "using an ENI will make things better" (since traffic segmentation
is always good practice anyway :)  YMMV. 
> 
> 
> 
> On Thu, Jun 19, 2014 at 3:39 PM, Russell Bradberry <rbradberry@gmail.com> wrote:
> does an elastic network interface really use a different physical network interface?
or is it just to give the ability for multiple ip addresses?
> 
> 
> 
> On June 19, 2014 at 3:56:34 PM, Nate McCall (nate@thelastpickle.com) wrote:
> 
>> If someone really wanted to try this it, I recommend adding an Elastic Network Interface
or two for gossip and client/API traffic. This lets EBS and management traffic have the pre-configured
network. 
>> 
>> 
>> On Thu, Jun 19, 2014 at 6:54 AM, Benedict Elliott Smith <belliottsmith@datastax.com>
wrote:
>> I would say this is worth benchmarking before jumping to conclusions. The network
being a bottleneck (or latency causing) for EBS is, to my knowledge, supposition, and instances
can be started with direct connections to EBS if this is a concern. The blog post below shows
that even without SSDs the EBS-optimised provisioned-IOPS instances show pretty consistent
latency numbers, although those latencies are higher than you would typically expect from
locally attached storage.
>> 
>> http://blog.parse.com/2012/09/17/parse-databases-upgraded-to-amazon-provisioned-iops/
>> 
>> Note, I'm not endorsing the use of EBS. Cassandra is designed to scale up with number
of nodes, not with depth of nodes (as Ben mentions, saturating a single node's data capacity
is pretty easy these days. CPUs rapidly become the bottleneck as you try to go deep). However
the argument that EBS cannot provide consistent performance seems overly pessimistic, and
should probably be empirically determined for your use case.
>> 
>> 
>> On Thu, Jun 19, 2014 at 9:50 AM, Alain RODRIGUEZ <arodrime@gmail.com> wrote:
>> Ok, looks fair enough.
>> 
>> Thanks guys. I would be great to be able to add disks when amount of data raises
and add nodes when throughput increases... :)
>> 
>> 
>> 2014-06-19 5:27 GMT+02:00 Ben Bromhead <ben@instaclustr.com>:
>> 
>> http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePlanningEC2_c.html
>> 
>> From the link:
>> 
>> EBS volumes are not recommended for Cassandra data volumes for the following reasons:
>> 
>> • EBS volumes contend directly for network throughput with standard packets. This
means that EBS throughput is likely to fail if you saturate a network link.
>> • EBS volumes have unreliable performance. I/O performance can be exceptionally
slow, causing the system to back load reads and writes until the entire cluster becomes unresponsive.
>> • Adding capacity by increasing the number of EBS volumes per host does not scale.
You can easily surpass the ability of the system to keep effective buffer caches and concurrently
serve requests for all of the data it is responsible for managing.
>> 
>> Still applies, especially the network contention and latency issues. 
>> 
>> Ben Bromhead
>> Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359
>> 
>> On 18 Jun 2014, at 7:18 pm, Daniel Chia <danchia@coursera.org> wrote:
>> 
>>> While they guarantee IOPS, they don't really make any guarantees about latency.
Since EBS goes over the network, there's so many things in the path of getting at your data,
I would be concerned with random latency spikes, unless proven otherwise.
>>> 
>>> Thanks,
>>> Daniel
>>> 
>>> 
>>> On Wed, Jun 18, 2014 at 1:58 AM, Alain RODRIGUEZ <arodrime@gmail.com> wrote:
>>> In this document it is said :
>>> 
>>> Provisioned IOPS (SSD) - Volumes of this type are ideal for the most demanding
I/O intensive, transactional workloads and large relational or NoSQL databases. This volume
type provides the most consistent performance and allows you to provision the exact level
of performance you need with the most predictable and consistent performance. With this type
of volume you provision exactly what you need, and pay for what you provision. Once again,
you can achieve up to 48,000 IOPS by connecting multiple volumes together using RAID.
>>> 
>>> 
>>> 2014-06-18 10:57 GMT+02:00 Alain RODRIGUEZ <arodrime@gmail.com>:
>>> 
>>> Hi,
>>> 
>>> I just saw this : http://aws.amazon.com/fr/blogs/aws/new-ssd-backed-elastic-block-storage/
>>> 
>>> Since the problem with EBS was the network, there is no chance that this hardware
architecture might be useful alongside Cassandra, right ?
>>> 
>>> Alain
>>> 
>>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> --
>> -----------------
>> Nate McCall
>> Austin, TX
>> @zznate
>> 
>> Co-Founder & Sr. Technical Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
> 
> 
> 
> -- 
> -----------------
> Nate McCall
> Austin, TX
> @zznate
> 
> Co-Founder & Sr. Technical Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com


Mime
View raw message