Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of etamme@gmail.com designates
 209.85.161.44 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type:content-transfer-encoding;
        b=ZcBMncNmfMr2uGzyCI7l2ZTEDIT5LvffAwr65HGARb9qcfd8CO6IS9eCNSKmrz9IoE
         V+YzChc35ZRxgIPxO0HCo2pXCcWsdb6lQ6Ugo+7LlN2uChH+VRe9CKJDdDJcz3JHwnHF
         mMbGW8/TO8RDauvqLKIl/OpDh1pXq1R0KAI1M=
MIME-Version: 1.0
In-Reply-To: <4DF67BC8.8020800@dude.podzone.net>
References: <4DF67BC8.8020800@dude.podzone.net>
Date: Tue, 14 Jun 2011 06:46:43 -0400
Message-ID: <BANLkTik7GRDbYmdQkjYRjJ1O3X8UOn+_8Q@mail.gmail.com>
Subject: Re: Is this the proper use of OPP?
From: Eric tamme <etamme@gmail.com>
To: user@cassandra.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

I would point you to this article, it does a good job describing OPP
and pretty much answers the specific questions you asked.

http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-order=
preservingpartitioner/

-Eric


On Mon, Jun 13, 2011 at 5:06 PM, AJ <aj@dude.podzone.net> wrote:
> I'm just becoming aware of the restrictions of using an OPP as compared t=
o
> Random. =A0Please let me know if I understand this correctly.
>
> First off, if using the OPP only for an increased performance of range
> queries, then it will probably be very hard to predict if you will end up
> with hotspots or not and thus where and even how the data may be clustere=
d
> together in a particular node. =A0This is because all the various keys of=
 the
> various CFs may or may not have any correlation with one another. =A0So, =
in
> effect, you just have a big mess of keys of various ranges and formats, b=
ut
> they all are partitioned according to one global set of tokens that apply=
 to
> ALL CFs of ALL keyspaces.
>
> [main reason for post below...]
> OTOH, if you want to use OPP to purposely cluster certain data together o=
n
> specific nodes, such as for geographic partitioning, then you have to cho=
ose
> a prefix for all of the keys of ALL CFs and ALL keyspaces! =A0This is bec=
ause
> they will all be partitioned based on the tokens assigned to the nodes.
> =A0IOW, if I had two datacenters, one in the US and another in Europe, th=
en
> for all rows in all KSs and in all CFs, I would need to prepend a prefix =
to
> the keys, such as "US:" and "EU:". =A0The problem is I may not want ALL o=
f my
> CFs to be partitioned this way; only specific ones. =A0Also, it may be ve=
ry
> difficult if not impossible for all keys of all keyspaces and CFs to use
> keys of this form. =A0I'm not sure if Cass is designed for this.
>
> However, if using the random partitioner, then there is no problem. =A0Yo=
u can
> use any key of any type you want (UTF8, Long, etc.) since they are all
> hashed before deciding which node gets the key/row.
>
> Do I understand things correctly or am I missing something? =A0Is Cass
> designed to use OPP this way or am I hacking it? =A0If so, is there an
> acceptable way to do geographic partitioning?
>
> Also, what is OPP really good for?
>
> Thanks!
>