On Friday, February 22, 2013, Jared Biel wrote:
> As a counter argument though, anyone running a C* cluster on the Amazon cloud is going to be using SAN storage (or some kind of proprietary storage array) at the lowest  layers...Amazon isn't going to have a bunch of JBOD running their cloud infrastructure.  However, they've invested in the infrastructure to do it right.

This is certainly true when using EBS, however it's generally not recommended to use EBS when running Cassandra. EBS has proven to be unreliable in the past and it's a bit of a SPOF. Instead, it's recommended to use the "instance store" disks that come with most instances (handy chart here: http://www.ec2instances.info/). These are the rough equivalent of local disks (probably host level RAID 10 storage if I'd have to guess.)

-Jared

On 22 February 2013 00:40, Michael Morris <michael.m.morris@gmail.com> wrote:
I'm running a 27 node cassandra cluster on SAN without issue.  I will be perfectly clear though, the hosts are multi-homed to different switches/fabrics in the SAN, we have an _expensive_ EMC array, and other than a datacenter-wide power outage, there's no SPOF for the SAN.  We use it because it's there, and it's already a sunk cost.

I certainly would not go out of my way to purchase SAN infrastructure for a C* cluster, it just doesn't make sense (for all the reasons others have mentioned).  Any more, you can load up a single 2U server with multi-TB worth of disk, so the aggregate storage capacity of your C* cluster could potentially be as much as a SAN you would purchase (and a lot less hassle too).

As a counter argument though, anyone running a C* cluster on the Amazon cloud is going to be using SAN storage (or some kind of proprietary storage array) at the lowest layers...Amazon isn't going to have a bunch of JBOD running their cloud infrastructure.  However, they've invested in the infrastructure to do it right.

- Mike


On Thu, Feb 21, 2013 at 6:08 PM, P. Taylor Goetz <ptgoetz@gmail.com> wrote:
I shouldn't have used the word "spinning"... SSDs are a great option as well.

I also agree with all the "expensive SPOF" points others have made.

Sent from my iPhone

On Feb 21, 2013, at 6:56 PM, "P. Taylor Goetz" <ptgoetz@gmail.com> wrote:

Cassandra is designed to write and read data in a way that is optimized for physical spinning disks.

Running C* on a SAN introduces a layer of abstraction that, at best negates those optimizations, and at worst introduces additional overhead.

Sent from my iPhone

On Feb 21, 2013, at 6:42 PM, Kanwar Sangha <kanwar@mavenir.com> wrote:

Ok. What would be the drawbacks J

 

From: Michael Kjellman [mailto:mkjellman@barracuda.com]
Sent: 21 February 2013 17:12
To: user@cassandra.apache.org
Subject: Re: Cassandra with SAN

 

No, this is a really really bad idea and C* was not designed for this, in fact, it was designed so you don't need to have a large expensive SAN.

 

Don't be tempted by the shiny expensive SAN. :)

 

If money is no object instead throw SSD's in your nodes and run 10G between racks

 

From: Kanwar Sangha <kanwar@mavenir.com>
Reply-To: "user@cassandra.apache.org" <