cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <>
Subject Re: OPHF
Date Wed, 01 Apr 2009 21:31:58 GMT
Avinash already commited his new order-preserving hash function and I
missed it.  It's in OrderPreservingHashPartitioner.  It takes the
approach that Todd and I discussed back in January: turn the string
into a base-Char.MAX_VALUE number.
 I chatted with Avinash a little on IM but we didn't finish, so I'm
picking it up here.

There are two problems with this approach.  One is that the hashes
will only be order-preserving for a subset of unicode (all of UCS-2,
but not all of UTF-16; see
 The other is that this only gives you a naive ordering by code point
value, which for unicode is not what you want and even for ascii
sometimes you want another collation.  (see and

Say for instance you have inserted keys ['a', 'b', '--a', '--b'] and
you are going to do range queries on them.  The correct
collation-aware sort is ['a', '--a', 'b', '--b'].  But ordering by
char value gives ['--a', '--b', 'a', 'b'].

Switching to a more flexibile system like the one I wrote for
CASSANDRA-3 will let use use Token<BigInteger> for random distribution
or Token<String> for order-preserving, with user-defined collation.  I
don't see a way to get this kind of flexibility from an approach that
insists on turning everything into BigInteger.


On Mon, Mar 30, 2009 at 2:16 PM, Jonathan Ellis <> wrote:
> Avinash,
> You mentioned that you have a new order-preserving hash function that
> you think will be more generally useful.  Can you post it?
> thanks,
> -Jonathan

View raw message