incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From AJ ...@dude.podzone.net>
Subject Re: Is this the proper use of OPP?
Date Tue, 14 Jun 2011 12:05:15 GMT
Thanks.  I found that article later.  I was definitely off-base with 
respect to OPP.  Random partitioning is pretty much the way to go and 
datastax has a good article on geographic distribution: 
http://www.datastax.com/docs/0.8/operations/datacenter

Sorry for the long pointless post previously.  But, FWIW, I don't see 
much use for OPP other than the corner case of a cluster consisting on 1 
ks and 1 cf, such as an index.  I will have to read Dominic's post on 
having multiple Cass clusters running on the same nodes.

On 6/14/2011 4:46 AM, Eric tamme wrote:
> I would point you to this article, it does a good job describing OPP
> and pretty much answers the specific questions you asked.
>
> http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/
>
> -Eric
>
>
> On Mon, Jun 13, 2011 at 5:06 PM, AJ<aj@dude.podzone.net>  wrote:
>> I'm just becoming aware of the restrictions of using an OPP as compared to
>> Random.  Please let me know if I understand this correctly.
>>
>> First off, if using the OPP only for an increased performance of range
>> queries, then it will probably be very hard to predict if you will end up
>> with hotspots or not and thus where and even how the data may be clustered
>> together in a particular node.  This is because all the various keys of the
>> various CFs may or may not have any correlation with one another.  So, in
>> effect, you just have a big mess of keys of various ranges and formats, but
>> they all are partitioned according to one global set of tokens that apply to
>> ALL CFs of ALL keyspaces.
>>
>> [main reason for post below...]
>> OTOH, if you want to use OPP to purposely cluster certain data together on
>> specific nodes, such as for geographic partitioning, then you have to choose
>> a prefix for all of the keys of ALL CFs and ALL keyspaces!  This is because
>> they will all be partitioned based on the tokens assigned to the nodes.
>>   IOW, if I had two datacenters, one in the US and another in Europe, then
>> for all rows in all KSs and in all CFs, I would need to prepend a prefix to
>> the keys, such as "US:" and "EU:".  The problem is I may not want ALL of my
>> CFs to be partitioned this way; only specific ones.  Also, it may be very
>> difficult if not impossible for all keys of all keyspaces and CFs to use
>> keys of this form.  I'm not sure if Cass is designed for this.
>>
>> However, if using the random partitioner, then there is no problem.  You can
>> use any key of any type you want (UTF8, Long, etc.) since they are all
>> hashed before deciding which node gets the key/row.
>>
>> Do I understand things correctly or am I missing something?  Is Cass
>> designed to use OPP this way or am I hacking it?  If so, is there an
>> acceptable way to do geographic partitioning?
>>
>> Also, what is OPP really good for?
>>
>> Thanks!
>>


Mime
View raw message