cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandeep Tata <>
Subject Re: OPHF vs. Random
Date Mon, 16 Mar 2009 21:30:19 GMT
I like the idea of supporting more general/sophisticated strategies.

Let me see if I understand the issues at play here:

OPHFs are tricky to design and leaning on it for load-balancing and
data locality will require incredibly good OPHFs that might not exist.
(I learned this with a bunch of the experiments we ran on our
relatively small test cluster)

RANDOM of course is going to be great for load-balancing, but we're
completely giving up locality, so range queries are shot.

If we want to support clever placement strategies, we'll need to make
some changes. Take for instance a key like "userid.messageid" . I
might want:
a) all the keys with the same userid on the same node, and
b) all the messageids stored in order so I can do simple range queries
like "get message 1 to 100"

OPHF might break a) and RANDOM will break b)

The claim is that the simplest (best :-) ) way to guarantee a) and b)
is to map the key to an end-point instead of merely an integer.

What if I changed the hash-function to do RANDOM on just the "userid"
part. And each node still stores the keys in "<" order on the entire
key ("userid.messageid") Would this solve the problem? What is this
approach missing?

Do we just need to decouple the hash used for routing from the key
used in the end-point for storage? Is this essentially what the series
of patches does?

On Mon, Mar 16, 2009 at 1:36 PM, Jonathan Ellis <> wrote:
> The order-preserving partitioner code (not hash-based anymore) is up
> now at
> -Jonathan
> On Wed, Mar 11, 2009 at 6:48 PM, Jonathan Ellis <> wrote:
>> Use Random for now.  The OPHF is the same as the old one, i.e., not
>> actually OP. :)
>> I'm pretty convinced at this point that it's impossible to have an
>> order-preserving hash that doesn't either (a) impose a relatively
>> short key length past which no partitioning is done (i.e., all keys w/
>> the same prefix go to the same node) or is (b) very sensitive to key
>> length such that the keys with a given length N will not be evenly
>> distributed across all nodes. Or both.
>> So I am working on migrating from pluggable hash functions key ->
>> BigInteger, to pluggable partitioning algorithms key -> EndPoint.
>> Without the requirement to transform to a numeric value first I think
>> I can create an order-preserving distribution that performs well.  (I
>> need this for range queries.)
>> So far I have just laid the foundation, here:
>> I hope to finish the rest tomorrow.
>> -Jonathan
>> On Wed, Mar 11, 2009 at 5:28 PM, Jiansheng Huang <> wrote:
>>> Which one is better to use? The default is Random.
>>> In Avinash's annoucement mail, we have
>>> (1) Ability to switch between a random hash and a OPHF. We still have the
>>> old (wrong) OPHF in there. I will update it to the corrected one tomorrow.
>>> Is correct OPHF in? Thanks.

View raw message