incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avinash Lakshman <avinash.laksh...@gmail.com>
Subject Re: OPHF
Date Wed, 01 Apr 2009 21:47:47 GMT
In fact what would you want is hash to respect oorder based on how the
strings are sorted. For the example you have it does. So am I missing
something?

Avinash

On Wed, Apr 1, 2009 at 2:43 PM, Jonathan Ellis <jbellis@gmail.com> wrote:

> For that aspect no difference between a String ring based on compareTo
> and a BigInteger one.  The only difference (and it is an important one
> for the reasons I gave!) is how the compare works.  But for the p2p
> aspect it does not matter.
>
> On Wed, Apr 1, 2009 at 3:40 PM, Avinash Lakshman
> <avinash.lakshman@gmail.com> wrote:
> > Doing what you are suggesting scares the hell out of me for a couple of
> > reasons -  All work in P2P be it random/OPHF does the token handling the
> way
> > it is setup. I cannot try something that has not been well explored in
> > academia. I insist this must be doable. I am going to think about this
> more.
> >
> > Avinash
> >
> > On Wed, Apr 1, 2009 at 2:31 PM, Jonathan Ellis <jbellis@gmail.com>
> wrote:
> >
> >> Avinash already commited his new order-preserving hash function and I
> >> missed it.  It's in OrderPreservingHashPartitioner.  It takes the
> >> approach that Todd and I discussed back in January: turn the string
> >> into a base-Char.MAX_VALUE number.
> >> (
> >>
> http://groups.google.com/group/cassandra-dev/browse_thread/thread/6bda8518466210e7/f53b79c19010a9ed
> >> ).
> >>  I chatted with Avinash a little on IM but we didn't finish, so I'm
> >> picking it up here.
> >>
> >> There are two problems with this approach.  One is that the hashes
> >> will only be order-preserving for a subset of unicode (all of UCS-2,
> >> but not all of UTF-16; see http://en.wikipedia.org/wiki/UTF-16/UCS-2).
> >>  The other is that this only gives you a naive ordering by code point
> >> value, which for unicode is not what you want and even for ascii
> >> sometimes you want another collation.  (see
> >> http://www.unicode.org/reports/tr10/ and
> >> http://java.sun.com/javase/6/docs/api/java/text/Collator.html).
> >>
> >> Say for instance you have inserted keys ['a', 'b', '--a', '--b'] and
> >> you are going to do range queries on them.  The correct
> >> collation-aware sort is ['a', '--a', 'b', '--b'].  But ordering by
> >> char value gives ['--a', '--b', 'a', 'b'].
> >>
> >> Switching to a more flexibile system like the one I wrote for
> >> CASSANDRA-3 will let use use Token<BigInteger> for random distribution
> >> or Token<String> for order-preserving, with user-defined collation.  I
> >> don't see a way to get this kind of flexibility from an approach that
> >> insists on turning everything into BigInteger.
> >>
> >> -Jonathan
> >>
> >> On Mon, Mar 30, 2009 at 2:16 PM, Jonathan Ellis <jbellis@gmail.com>
> wrote:
> >> > Avinash,
> >> >
> >> > You mentioned that you have a new order-preserving hash function that
> >> > you think will be more generally useful.  Can you post it?
> >> >
> >> > thanks,
> >> >
> >> > -Jonathan
> >> >
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message