cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Boxenhorn <da...@lookin2.com>
Subject Re: Distribution Factor: part of the solution to many-CF problem?
Date Mon, 21 Feb 2011 22:14:24 GMT
No, that's not what I mean at all.

That message is about the ability to use different partitioners for
different CFs, say, RandomPartitioner for one, OPP for another.

I'm talking about defining how many nodes a CF should be distributed over,
which would be useful if you have a lot of nodes and a lot of small CFs
(small relative to the total amount of data).


On Mon, Feb 21, 2011 at 9:58 PM, Aaron Morton <aaron@thelastpickle.com>wrote:

> Sounds a bit like this idea
> http://www.mail-archive.com/dev@cassandra.apache.org/msg01799.html
>
> Aaron
>
> On 22/02/2011, at 1:28 AM, David Boxenhorn <david@lookin2.com> wrote:
>
> > Cassandra is both distributed and replicated. We have Replication Factor
> but no Distribution Factor!
> >
> > Distribution Factor would define over how many nodes a CF should be
> distributed.
> >
> > Say you want to support millions of multi-tenant users in clusters with
> thousands of nodes, where you don't know the user's schema in advance, so
> you can't have users share CFs.
> >
> > In this case you wouldn't want to spread out each user's Column Families
> over thousands of nodes! You would want something like: RF=3, DF=10 i.e.
> distribute each CF over 10 nodes, within those nodes replicate 3 times.
> >
> > One implementation of DF would be to hash the CF name, and use the same
> strategies defined for RF to choose the N nodes in DF=N.
> >
>

Mime
View raw message