From user-return-17637-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Mon Jun 13 21:06:51 2011 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B0493611E for ; Mon, 13 Jun 2011 21:06:51 +0000 (UTC) Received: (qmail 57161 invoked by uid 500); 13 Jun 2011 21:06:49 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 57078 invoked by uid 500); 13 Jun 2011 21:06:49 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 57070 invoked by uid 99); 13 Jun 2011 21:06:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Jun 2011 21:06:49 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [204.13.248.74] (HELO mho-02-ewr.mailhop.org) (204.13.248.74) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Jun 2011 21:06:41 +0000 Received: from 75-166-66-241.hlrn.qwest.net ([75.166.66.241] helo=[192.168.0.2]) by mho-02-ewr.mailhop.org with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.72) (envelope-from ) id 1QWEKy-000ASO-It for user@cassandra.apache.org; Mon, 13 Jun 2011 21:06:20 +0000 X-Mail-Handler: MailHop Outbound by DynDNS X-Originating-IP: 75.166.66.241 X-Report-Abuse-To: abuse@dyndns.com (see http://www.dyndns.com/services/mailhop/outbound_abuse.html for abuse reporting information) X-MHO-User: U2FsdGVkX1/K+pOYTx1dw02YMyYpRXd18Iodxbe4odw= Message-ID: <4DF67BC8.8020800@dude.podzone.net> Date: Mon, 13 Jun 2011 15:06:16 -0600 From: AJ User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.17) Gecko/20110414 Lightning/1.0b2 Thunderbird/3.1.10 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: Is this the proper use of OPP? Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit I'm just becoming aware of the restrictions of using an OPP as compared to Random. Please let me know if I understand this correctly. First off, if using the OPP only for an increased performance of range queries, then it will probably be very hard to predict if you will end up with hotspots or not and thus where and even how the data may be clustered together in a particular node. This is because all the various keys of the various CFs may or may not have any correlation with one another. So, in effect, you just have a big mess of keys of various ranges and formats, but they all are partitioned according to one global set of tokens that apply to ALL CFs of ALL keyspaces. [main reason for post below...] OTOH, if you want to use OPP to purposely cluster certain data together on specific nodes, such as for geographic partitioning, then you have to choose a prefix for all of the keys of ALL CFs and ALL keyspaces! This is because they will all be partitioned based on the tokens assigned to the nodes. IOW, if I had two datacenters, one in the US and another in Europe, then for all rows in all KSs and in all CFs, I would need to prepend a prefix to the keys, such as "US:" and "EU:". The problem is I may not want ALL of my CFs to be partitioned this way; only specific ones. Also, it may be very difficult if not impossible for all keys of all keyspaces and CFs to use keys of this form. I'm not sure if Cass is designed for this. However, if using the random partitioner, then there is no problem. You can use any key of any type you want (UTF8, Long, etc.) since they are all hashed before deciding which node gets the key/row. Do I understand things correctly or am I missing something? Is Cass designed to use OPP this way or am I hacking it? If so, is there an acceptable way to do geographic partitioning? Also, what is OPP really good for? Thanks!