Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 86864 invoked from network); 21 Feb 2011 22:14:53 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 21 Feb 2011 22:14:53 -0000 Received: (qmail 55126 invoked by uid 500); 21 Feb 2011 22:14:50 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 55082 invoked by uid 500); 21 Feb 2011 22:14:50 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 55074 invoked by uid 99); 21 Feb 2011 22:14:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Feb 2011 22:14:50 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.214.172] (HELO mail-iw0-f172.google.com) (209.85.214.172) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Feb 2011 22:14:45 +0000 Received: by iwl42 with SMTP id 42so3525611iwl.31 for ; Mon, 21 Feb 2011 14:14:24 -0800 (PST) MIME-Version: 1.0 Received: by 10.42.171.136 with SMTP id j8mr2593229icz.253.1298326464216; Mon, 21 Feb 2011 14:14:24 -0800 (PST) Received: by 10.231.15.72 with HTTP; Mon, 21 Feb 2011 14:14:24 -0800 (PST) X-Originating-IP: [109.186.66.33] In-Reply-To: References: Date: Tue, 22 Feb 2011 00:14:24 +0200 Message-ID: Subject: Re: Distribution Factor: part of the solution to many-CF problem? From: David Boxenhorn To: user@cassandra.apache.org Cc: Aaron Morton Content-Type: multipart/alternative; boundary=90e6ba6e8fa4b3bddd049cd22fdb --90e6ba6e8fa4b3bddd049cd22fdb Content-Type: text/plain; charset=ISO-8859-1 No, that's not what I mean at all. That message is about the ability to use different partitioners for different CFs, say, RandomPartitioner for one, OPP for another. I'm talking about defining how many nodes a CF should be distributed over, which would be useful if you have a lot of nodes and a lot of small CFs (small relative to the total amount of data). On Mon, Feb 21, 2011 at 9:58 PM, Aaron Morton wrote: > Sounds a bit like this idea > http://www.mail-archive.com/dev@cassandra.apache.org/msg01799.html > > Aaron > > On 22/02/2011, at 1:28 AM, David Boxenhorn wrote: > > > Cassandra is both distributed and replicated. We have Replication Factor > but no Distribution Factor! > > > > Distribution Factor would define over how many nodes a CF should be > distributed. > > > > Say you want to support millions of multi-tenant users in clusters with > thousands of nodes, where you don't know the user's schema in advance, so > you can't have users share CFs. > > > > In this case you wouldn't want to spread out each user's Column Families > over thousands of nodes! You would want something like: RF=3, DF=10 i.e. > distribute each CF over 10 nodes, within those nodes replicate 3 times. > > > > One implementation of DF would be to hash the CF name, and use the same > strategies defined for RF to choose the N nodes in DF=N. > > > --90e6ba6e8fa4b3bddd049cd22fdb Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
No, that's not what I mean at all.

That message= is about the ability to use different partitioners for different CFs, say,= RandomPartitioner for one, OPP for another.

I'm talking about d= efining how many nodes a CF should be distributed over, which would be usef= ul if you have a lot of nodes and a lot of small CFs (small relative to the= total amount of data).


On Mon, Feb 21, 2011 at 9:58 PM, Aaron M= orton <aaro= n@thelastpickle.com> wrote:
Sounds a bit like this idea http://www.mail-archive.c= om/dev@cassandra.apache.org/msg01799.html

Aaron

On 22/02/2011, at 1:28 AM, David Boxenhorn <david@lookin2.com> wrote:

> Cassandra is both distributed and replicated. We have Replication Fact= or but no Distribution Factor!
>
> Distribution Factor would define over how many nodes a CF should be di= stributed.
>
> Say you want to support millions of multi-tenant users in clusters wit= h thousands of nodes, where you don't know the user's schema in adv= ance, so you can't have users share CFs.
>
> In this case you wouldn't want to spread out each user's Colum= n Families over thousands of nodes! You would want something like: RF=3D3, = DF=3D10 i.e. distribute each CF over 10 nodes, within those nodes replicate= 3 times.
>
> One implementation of DF would be to hash the CF name, and use the sam= e strategies defined for RF to choose the N nodes in DF=3DN.
>

--90e6ba6e8fa4b3bddd049cd22fdb--