cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony John <>
Subject Re: Cassandra on iSCSI?
Date Fri, 21 Jan 2011 19:50:35 GMT
Sort of - do not agree!!

This is the Shared nothing V/s Shared Disk debate. There are many mainstream
RDBMS products that pretend to do horizontal scalability with Shared Disks.
They have the kinds of problems that Cassandra is specifically architected
to avoid!

The original question here has 2 aspects to it:-
1. Is iSCSI SAN good enough - My take is that it is still the poor man's SAN
as compared to FC based SANs. Having said that,  they have found increasing
adoption and the performance penalty is really marginal. Couple that with
the fact that Cassandra is architected to reduce the need for high
performance storage systems via features like reducing of random writes etc.
So net net - a reasonable iSCSI SAN should work.
2. Does it make sense to use a SPOF SAN - again this militates again the
architectural underpinnings of Cassandra, that relies on the shared nothing
idea to ensure that problems - say a bad disk - are easily isolated to a
particular node. On a SAN, depending on RAID configs, and how LUNs are
carved out and so on, a few disk outages could affect multiple nodes. A
performance problem with the SAN, could now affects your entire Cassandra
cluster, and so on. Cassandra is not meant to be set up this way!

But but the real world today - Large storage volumes are available
only with SANs. Rackable machines do not leave a lot of space - typically -
for a bunch of HDDs. On top of that, SANs provide all kinds of admin
capabilities that supposedly help with uptime and performance guarantees and
so on. So a Colo DC might not have any other option but shared storage!

So if one is forced to use a SAN, how should you set up Cassandra is the
interesting question - to me! Here are some thoughts:-
1. Ensure that each node gets dedicated - not shared - LUNs
2. Ensure that these LUNs do share spindles, or nodes will seize to be
isolatable (this will be tough to get, given how SAN administrators think
about this)
3. Most SANs deliver performance by striping (RAID 0) - sacrifice striping
for isolation if push comes to shove
4. Do not share data directories from mutliple nodes onto a single location
via NFS or CFS for example. They are cool in shared resource environments,
but breaks the premise behind Cassandra. All data storage should be private
to the cassandra node, even when on shared storage
5. Do not change any assumption around Replication Factor (RF) or
Consistency Levle (CL) due to the shared storage - in fact if anything,
increase your replication factor because you now have potential SPOF

My two - or maybe more - cents on the issue,


On Fri, Jan 21, 2011 at 1:15 PM, Edward Capriolo <>wrote:

> On Fri, Jan 21, 2011 at 12:07 PM, Jonathan Ellis <>
> wrote:
> > On Fri, Jan 21, 2011 at 2:19 AM, Mick Semb Wever <> wrote:
> >>
> >>> Of course with a SAN you'd want RF=1 since it's replicating
> >>> internally.
> >>
> >> Isn't this the same case for raid-5 as well?
> >
> > No, because the replication is (mainly) to protect you from machine
> > failures; if the SAN is a SPOF then putting more replicas on it
> > doesn't help.
> >
> >> And we want RF=2 if we need to keep reading while doing rolling
> >> restarts?
> >
> > Yes.
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder of Riptano, the source for professional Cassandra support
> >
> >
> If you are using cassandra with a SAN RF=1 makes sense because we are
> making the assumption the san is already replicating your data. RF2
> makes good sense to be not effected by outages. Another alternative is
> something like linux-HA and manage each cassandra instance as a
> resource. This way if a head goes down another node linux ha would
> detect the failure and bring up that instance on another physical
> piece of hardware.
> Using LinuxHA+SAN+Cassandra would actually bring Cassandra closer to
> the hbase model which you have a distributed file system but the front
> end Cassandra acts like a region server.

View raw message