incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Boxenhorn <da...@lookin2.com>
Subject Re: OPP + Hash on client side
Date Wed, 07 Jul 2010 10:33:50 GMT
Aaron, thank you for the link.

What is discussed there is not exactly what I am thinking of. They propose
distributing the keys with <MD5(ROWKEY)>.<ROWKEY> - which will distribute
the values in a way that cannot easily be reversed. What I am proposing is
to distribute the keys evenly among N buckets, where N is much larger than
your number of nodes, and then construct my range queries as the union of N
range queries that I actually perform on Cassandra.

"You can do range queries with the Random Partitioner in 0.6.*"

I went though this before, it's not true. What you can do is loop over your
entire set of keys in random order. There is no way to get an actual range
other than the whole range.


On Wed, Jul 7, 2010 at 1:15 PM, Aaron Morton <aaron@thelastpickle.com>wrote:

> That pattern is discussed here
> http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/
>
> It's also used in http://github.com/tjake/Lucandra
>
> You can do range queries with the Random Partitioner in 0.6.*, the order of
> the return is undefined and it's a bit slower.
>
> I think it's normally used when you want ordered range queries in some CF's
> and random distribution in others.
>
> Aaron
>
>
> On 07 Jul, 2010,at 09:47 PM, David Boxenhorn <david@lookin2.com> wrote:
>
> Is there any strategy for using OPP with a hash algorithm on the client
> side to get both uniform distribution of data in the cluster *and* the
> ability to do range queries?
>
> I'm thinking of something like this:
>
> cassKey = (key % 97) + "@" + key;
>
> cassRange = 0 + "@" + range; 1 + "@" + range; ... 96 + "@" + range;
>
> Would something like that work?
>
>

Mime
View raw message