incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Partitioner per keyspace
Date Wed, 28 Sep 2011 14:27:41 GMT
On Wed, Sep 28, 2011 at 4:36 AM, aaron morton <aaron@thelastpickle.com>wrote:

> Thats the one I was thinking of.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 28/09/2011, at 9:12 PM, Sylvain Lebresne wrote:
>
> > https://issues.apache.org/jira/browse/CASSANDRA-295
> >
> > --
> > Sylvain
> >
> > On Wed, Sep 28, 2011 at 10:06 AM, aaron morton <aaron@thelastpickle.com>
> wrote:
> >> The first thing I can think of is the initial_token for the node must be
> a
> >> valid token according to the configured partitioner, as the tokens
> created
> >> by the partitioner are the things stored the distributed hash tree. If
> you
> >> had a partitioner per KS you would need to configure the initial_token
> per
> >> KS.
> >> Also it's not possible to change *ever* change the partitioner, so it
> would
> >> have to be excluded from the KS update.
> >> They are not show stoppers, just the firs things that come to mind.
> >> IIRC a lot of the other access happens in the context of a KS, their may
> be
> >> other issues but I've not checked the code.
> >> Anyone else ?
> >>
> >> -----------------
> >> Aaron Morton
> >> Freelance Cassandra Developer
> >> @aaronmorton
> >> http://www.thelastpickle.com
> >> On 28/09/2011, at 8:28 PM, Philippe wrote:
> >>
> >> Hi is there any reason why configuring a partitioner per keyspace
> wouldn't
> >> be possible technically ?
> >>
> >> Thanks.
> >>
>
>
The last time I asked about this I heard it was "really baked in" this led
me to plan on this not happening any time soon. If you really need two
partitioner my advice is to run two clusters. In some cases multi-tenancy
depending on how you use the word is possible, but in other cases it is a
pipe dream.

The reason I say this is that as you add more CF and KS to a cluster you
lower your ability to optimize for a specific keyspace. You inevitably get
different workloads and they internally start contending for resources. Also
may run into a situation where you need to scale only one CF, but  because
of constraints of another you end up having to get resources/hardware you do
not need.

**depending on your work load not a hard fast rule**
For example say you have two column families and a 10 node cluster.
ColumnFamily A 10GB data/node /500reads/sec
ColumnFamily B 500GB data/node /100reads/sec

Imagine column family A will need to double read traffic but column family B
does not. With one cluster you end up buying 10 nodes with 600GB disk
space.
With two clusters you could have just extended the capacity of one cluster
without the other.

You can get this vibe by listening to some of the talks at CassandraSF
http://twitter.com/#!/slideshare/status/78906858169057280

In particular twitter had precomputed a matrix of datasize/number
servers/ops sec. Rather then have one large cluster that has all your data
but tuned for none, have smaller distinct clusters exactly turned for your
workload.

I am a bit off topic but in general if you are considering two partitioners
you almost certainly want 2 distinct clusters.   Really NONE of the
operations work across keyspace anyway batch_mutate,mget, so a design that
spans keyspaces can be unorthodox.

Mime
View raw message