Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 41829 invoked from network); 22 Feb 2011 19:50:33 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 22 Feb 2011 19:50:33 -0000 Received: (qmail 86176 invoked by uid 500); 22 Feb 2011 19:50:30 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 86157 invoked by uid 500); 22 Feb 2011 19:50:30 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 86149 invoked by uid 99); 22 Feb 2011 19:50:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Feb 2011 19:50:29 +0000 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests=MIME_QP_LONG_LINE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a81.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Feb 2011 19:50:24 +0000 Received: from homiemail-a81.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a81.g.dreamhost.com (Postfix) with ESMTP id 38BDDA8061 for ; Tue, 22 Feb 2011 11:50:04 -0800 (PST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=subject :references:from:content-type:in-reply-to:message-id:date:to :content-transfer-encoding:mime-version; q=dns; s= thelastpickle.com; b=LMOarM4Is4nnwCfhsUe0CDz50pw/5wpm9OxFOFw2kp8 ezw8rfIziQy7/Mpbt/g1vCOY1Ksl/PDQ8fFVo95gWdudo5a9KE0FCTfbOMWNFy4k E3chrf9J8uP+lKssJag7zE7L8xL+Jog1AQQTHOfsTpESlXdJVnzXhtIPeLTxhhrA = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h= subject:references:from:content-type:in-reply-to:message-id:date :to:content-transfer-encoding:mime-version; s=thelastpickle.com; bh=csaSimLoL8dHJyWplChLeWAnhy4=; b=TZq14lzzNMKtwXRCjBsYTid50G9x kF7EINGx5EtfygdILKAtgHLdgOlNIjcQVsI/oFnz1hvlm1jKy4vTC/eN8sOGxTsl WmZ2BIosxeP6mwaxsOET5hFiEZkmYgtGkDI2sGlI5Nbw79aBOGEIk+wv9d7SOeA0 fJ2eHMDUkpdg8FU= Received: from [115.189.202.59] (unknown [115.189.202.59]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a81.g.dreamhost.com (Postfix) with ESMTPSA id 20094A805C for ; Tue, 22 Feb 2011 11:50:03 -0800 (PST) Subject: Re: Distribution Factor: part of the solution to many-CF problem? References: From: Aaron Morton Content-Type: text/plain; charset=us-ascii X-Mailer: iPad Mail (8C148) In-Reply-To: Message-Id: Date: Wed, 23 Feb 2011 08:49:55 +1300 To: "user@cassandra.apache.org" Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (iPad Mail 8C148) > The single partitioner is "baked in" That was my point. You could perhaps write a partitioner that considers the CF when deciding wh= at nodes to put data on. Off the top of my head the partitioner is not told a= bout the CF the key is storing in.=20 Aaron On 23/02/2011, at 6:01 AM, Edward Capriolo wrote: > On Mon, Feb 21, 2011 at 5:14 PM, David Boxenhorn wrote= : >> No, that's not what I mean at all. >>=20 >> That message is about the ability to use different partitioners for >> different CFs, say, RandomPartitioner for one, OPP for another. >>=20 >> I'm talking about defining how many nodes a CF should be distributed over= , >> which would be useful if you have a lot of nodes and a lot of small CFs >> (small relative to the total amount of data). >>=20 >>=20 >> On Mon, Feb 21, 2011 at 9:58 PM, Aaron Morton >> wrote: >>>=20 >>> Sounds a bit like this idea >>> http://www.mail-archive.com/dev@cassandra.apache.org/msg01799.html >>>=20 >>> Aaron >>>=20 >>> On 22/02/2011, at 1:28 AM, David Boxenhorn wrote: >>>=20 >>>> Cassandra is both distributed and replicated. We have Replication Facto= r >>>> but no Distribution Factor! >>>>=20 >>>> Distribution Factor would define over how many nodes a CF should be >>>> distributed. >>>>=20 >>>> Say you want to support millions of multi-tenant users in clusters with= >>>> thousands of nodes, where you don't know the user's schema in advance, s= o >>>> you can't have users share CFs. >>>>=20 >>>> In this case you wouldn't want to spread out each user's Column Familie= s >>>> over thousands of nodes! You would want something like: RF=3D3, DF=3D10= i.e. >>>> distribute each CF over 10 nodes, within those nodes replicate 3 times.= >>>>=20 >>>> One implementation of DF would be to hash the CF name, and use the same= >>>> strategies defined for RF to choose the N nodes in DF=3DN. >>>>=20 >>=20 >>=20 >=20 > The single partitioner is "baked in" >=20 > Here is a possible solution. Use OOP, but md5 hash your keys client side. >=20 > This solves that, but when you have keyspaces using OOP but with > different key distributions this falls apart.