cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <>
Subject Re: OPHF
Date Wed, 01 Apr 2009 21:43:38 GMT
For that aspect no difference between a String ring based on compareTo
and a BigInteger one.  The only difference (and it is an important one
for the reasons I gave!) is how the compare works.  But for the p2p
aspect it does not matter.

On Wed, Apr 1, 2009 at 3:40 PM, Avinash Lakshman
<> wrote:
> Doing what you are suggesting scares the hell out of me for a couple of
> reasons -  All work in P2P be it random/OPHF does the token handling the way
> it is setup. I cannot try something that has not been well explored in
> academia. I insist this must be doable. I am going to think about this more.
> Avinash
> On Wed, Apr 1, 2009 at 2:31 PM, Jonathan Ellis <> wrote:
>> Avinash already commited his new order-preserving hash function and I
>> missed it.  It's in OrderPreservingHashPartitioner.  It takes the
>> approach that Todd and I discussed back in January: turn the string
>> into a base-Char.MAX_VALUE number.
>> (
>> ).
>>  I chatted with Avinash a little on IM but we didn't finish, so I'm
>> picking it up here.
>> There are two problems with this approach.  One is that the hashes
>> will only be order-preserving for a subset of unicode (all of UCS-2,
>> but not all of UTF-16; see
>>  The other is that this only gives you a naive ordering by code point
>> value, which for unicode is not what you want and even for ascii
>> sometimes you want another collation.  (see
>> and
>> Say for instance you have inserted keys ['a', 'b', '--a', '--b'] and
>> you are going to do range queries on them.  The correct
>> collation-aware sort is ['a', '--a', 'b', '--b'].  But ordering by
>> char value gives ['--a', '--b', 'a', 'b'].
>> Switching to a more flexibile system like the one I wrote for
>> CASSANDRA-3 will let use use Token<BigInteger> for random distribution
>> or Token<String> for order-preserving, with user-defined collation.  I
>> don't see a way to get this kind of flexibility from an approach that
>> insists on turning everything into BigInteger.
>> -Jonathan
>> On Mon, Mar 30, 2009 at 2:16 PM, Jonathan Ellis <> wrote:
>> > Avinash,
>> >
>> > You mentioned that you have a new order-preserving hash function that
>> > you think will be more generally useful.  Can you post it?
>> >
>> > thanks,
>> >
>> > -Jonathan
>> >

View raw message