incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Cluster per Application vs. Multi-Application Clusters
Date Wed, 22 Aug 2012 19:30:21 GMT
If you are staring out small one logical/physical cluster is probably
the best and only approach.

Long term this is very case by case dependent but I generally believe
Cluster per Application is the best approach. Although I consider it
"Cluster per QOS"

For our use cases I find that two applications have very different
data sizes and quality of service requirements. For example, one
application may have a small dataset size and a high repeated read/
cache hit rate scenario. While another application may have a large
sparse dataset and a "random read pattern". Also one application may
demand fast < 3 ms reads while the other may find 10 or 20 ms reads
acceptable.

When those two applications are placed on the same set of hardware you
end up scaling them both even though at a given time only one or the
other needs to be scaled. In extreme cases application 1 and 2 cause
contention and make each other unhappy.

What is best to do is architect your systems in such a way that moving
an individual column family to a new set of hardware is not difficult.
This might involve something map reduce program that can bulk load
existing data between two clusters, while your front end application
can send the write/updates/deletes to both the old an the new cluster.
Also make sure your application does not have too many hard coded
touch points that assume a single cluster.

As you mentioned one thing gained from keeping everything in the same
keyspace is connection pooling. However unlike a RDBMS world where
coordinated transactions have to happen in order, etc, etc that is not
the case with C* so getting all data into the same physical "system"
is not as important.



On Wed, Aug 22, 2012 at 8:25 AM, Hiller, Dean <Dean.Hiller@nrel.gov> wrote:
> Just an opinion here as we are having to do this ourselves loading tons of researchers
datasets into one clusters.  We are going the path of one keyspace as it makes it easier if
you ever want to mine the data so you don't have to keep building different clients for another
keyspace.  We ended up adding our own security layer as well so researchers can expose their
datasets to other researchers and once exposed, other researchers can join that data with
their existing data.
>
> This of course is just one use case, but if 10 applications use cassandra, you still
may find a benefit in having an 11th data mining app look at the data from all 10 apps.
>
> Later,
> Dean
>
> playOrm Developer
>
> From: Ersin Er <ersin.er@gmail.com<mailto:ersin.er@gmail.com>>
> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> Date: Wednesday, August 22, 2012 12:44 AM
> To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> Subject: Cluster per Application vs. Multi-Application Clusters
>
> Hi all,
>
> What are the advantages of allocating a cluster for a single application vs running multiple
applications on the same cassandra cluster? Is any of the models suggested over the other?
>
> Thanks.
>
> --
> Ersin Er

Mime
View raw message