Aaron, thank you for the link.

What is discussed there is not exactly what I am thinking of. They propose distributing the keys with <MD5(ROWKEY)>.<ROWKEY> - which will distribute the values in a way that cannot easily be reversed. What I am proposing is to distribute the keys evenly among N buckets, where N is much larger than your number of nodes, and then construct my range queries as the union of N range queries that I actually perform on Cassandra.

"You can do range queries with the Random Partitioner in 0.6.*"

I went though this before, it's not true. What you can do is loop over your entire set of keys in random order. There is no way to get an actual range other than the whole range.


On Wed, Jul 7, 2010 at 1:15 PM, Aaron Morton <aaron@thelastpickle.com> wrote:
That pattern is discussed here http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/

It's also used in http://github.com/tjake/Lucandra

You can do range queries with the Random Partitioner in 0.6.*, the order of the return is undefined and it's a bit slower. 

I think it's normally used when you want ordered range queries in some CF's and random distribution in others. 

Aaron


On 07 Jul, 2010,at 09:47 PM, David Boxenhorn <david@lookin2.com> wrote:

Is there any strategy for using OPP with a hash algorithm on the client side to get both uniform distribution of data in the cluster *and* the ability to do range queries?

I'm thinking of something like this:

cassKey = (key % 97) + "@" + key;

cassRange = 0 + "@" + range; 1 + "@" + range; ... 96 + "@" + range;

Would something like that work?