Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 318666C61 for ; Tue, 14 Jun 2011 10:47:13 +0000 (UTC) Received: (qmail 62766 invoked by uid 500); 14 Jun 2011 10:47:11 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 62742 invoked by uid 500); 14 Jun 2011 10:47:11 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 62733 invoked by uid 99); 14 Jun 2011 10:47:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Jun 2011 10:47:11 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of etamme@gmail.com designates 209.85.161.44 as permitted sender) Received: from [209.85.161.44] (HELO mail-fx0-f44.google.com) (209.85.161.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Jun 2011 10:47:04 +0000 Received: by fxm15 with SMTP id 15so4161249fxm.31 for ; Tue, 14 Jun 2011 03:46:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type:content-transfer-encoding; bh=EU2vAXH+5tsUbvT/qlDuBFhoCjc9gtvj2d0AlLQJNDI=; b=AnQDNKuxJQ2nz7JfPKH8oJBtRJP1383XUHgNBw4VZaGrDh449w8kha8VyG5KQh8JSE eJsrnczRBJ5cQ/+vkIptgl01D95k4FjNwVnn4EKjJd6mMOBifUyeCadOI1kTwf7JCffQ lo/tk1FLU8vlZ+rLUF5x6whyYvmsvwYRML2n0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=ZcBMncNmfMr2uGzyCI7l2ZTEDIT5LvffAwr65HGARb9qcfd8CO6IS9eCNSKmrz9IoE V+YzChc35ZRxgIPxO0HCo2pXCcWsdb6lQ6Ugo+7LlN2uChH+VRe9CKJDdDJcz3JHwnHF mMbGW8/TO8RDauvqLKIl/OpDh1pXq1R0KAI1M= MIME-Version: 1.0 Received: by 10.223.97.65 with SMTP id k1mr364134fan.0.1308048404009; Tue, 14 Jun 2011 03:46:44 -0700 (PDT) Received: by 10.223.78.137 with HTTP; Tue, 14 Jun 2011 03:46:43 -0700 (PDT) In-Reply-To: <4DF67BC8.8020800@dude.podzone.net> References: <4DF67BC8.8020800@dude.podzone.net> Date: Tue, 14 Jun 2011 06:46:43 -0400 Message-ID: Subject: Re: Is this the proper use of OPP? From: Eric tamme To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org I would point you to this article, it does a good job describing OPP and pretty much answers the specific questions you asked. http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-order= preservingpartitioner/ -Eric On Mon, Jun 13, 2011 at 5:06 PM, AJ wrote: > I'm just becoming aware of the restrictions of using an OPP as compared t= o > Random. =A0Please let me know if I understand this correctly. > > First off, if using the OPP only for an increased performance of range > queries, then it will probably be very hard to predict if you will end up > with hotspots or not and thus where and even how the data may be clustere= d > together in a particular node. =A0This is because all the various keys of= the > various CFs may or may not have any correlation with one another. =A0So, = in > effect, you just have a big mess of keys of various ranges and formats, b= ut > they all are partitioned according to one global set of tokens that apply= to > ALL CFs of ALL keyspaces. > > [main reason for post below...] > OTOH, if you want to use OPP to purposely cluster certain data together o= n > specific nodes, such as for geographic partitioning, then you have to cho= ose > a prefix for all of the keys of ALL CFs and ALL keyspaces! =A0This is bec= ause > they will all be partitioned based on the tokens assigned to the nodes. > =A0IOW, if I had two datacenters, one in the US and another in Europe, th= en > for all rows in all KSs and in all CFs, I would need to prepend a prefix = to > the keys, such as "US:" and "EU:". =A0The problem is I may not want ALL o= f my > CFs to be partitioned this way; only specific ones. =A0Also, it may be ve= ry > difficult if not impossible for all keys of all keyspaces and CFs to use > keys of this form. =A0I'm not sure if Cass is designed for this. > > However, if using the random partitioner, then there is no problem. =A0Yo= u can > use any key of any type you want (UTF8, Long, etc.) since they are all > hashed before deciding which node gets the key/row. > > Do I understand things correctly or am I missing something? =A0Is Cass > designed to use OPP this way or am I hacking it? =A0If so, is there an > acceptable way to do geographic partitioning? > > Also, what is OPP really good for? > > Thanks! >