So with a String ring you just sort the strings (conceptually) and you
wrap around from strings[last] to strings[0]. The range is entirely
defined by what tokens you have.
It's nice to be able to test against a real application. Mine won't
run without range queries... (So it is still running against my
github code where I first wrote range queries, but that uses the OPHF
approach you have and so that is where I found the problems.)
Jonathan
On Wed, Apr 1, 2009 at 4:27 PM, Avinash Lakshman
<avinash.lakshman@gmail.com> wrote:
> I will look into this. Another point about P2P rings is something about the
> ability to reason about the system. In any hash fn we hve a notion of range
> and how the ring wraps around. For eg. in random hash such as MD5 we say
> 02^128 1 and then the wrap around. Similarly for the hash fn used here one
> could define the same. Today we perform usual sort of strings and the hash
> should respect that sort order. I guess you are saying the usual sort of
> string does not respect collation. I will look into this and I insist this
> should be very doable. Changes to the implementation of the core classes
> albeit how simple they are very very scary unless they are completely tested
> too in a distributed setting. Over here we do not have detailed test code,
> but we test by directing a % of the site traffic to a test cluster before we
> sign off on anything.
>
> Avinash
>
> On Wed, Apr 1, 2009 at 3:00 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>
>> Say for instance you have inserted keys ['a', 'b', 'a', 'b'] and
>> you are going to do range queries on them. The correct
>> collationaware sort is ['a', 'a', 'b', 'b']. But ordering by
>> char value gives ['a', 'b', 'a', 'b'].
>>
>> Attached is a test program illustrating this. The assert will fail
>> because the hash ordering is not the same as the collator's.
>>
>> Jonathan
>>
>> On Wed, Apr 1, 2009 at 3:47 PM, Avinash Lakshman
>> <avinash.lakshman@gmail.com> wrote:
>> > In fact what would you want is hash to respect oorder based on how the
>> > strings are sorted. For the example you have it does. So am I missing
>> > something?
>> >
>> > Avinash
>> >
>> > On Wed, Apr 1, 2009 at 2:43 PM, Jonathan Ellis <jbellis@gmail.com>
>> wrote:
>> >
>> >> For that aspect no difference between a String ring based on compareTo
>> >> and a BigInteger one. The only difference (and it is an important one
>> >> for the reasons I gave!) is how the compare works. But for the p2p
>> >> aspect it does not matter.
>> >>
>> >> On Wed, Apr 1, 2009 at 3:40 PM, Avinash Lakshman
>> >> <avinash.lakshman@gmail.com> wrote:
>> >> > Doing what you are suggesting scares the hell out of me for a couple
>> of
>> >> > reasons  All work in P2P be it random/OPHF does the token handling
>> the
>> >> way
>> >> > it is setup. I cannot try something that has not been well explored
in
>> >> > academia. I insist this must be doable. I am going to think about this
>> >> more.
>> >> >
>> >> > Avinash
>> >> >
>> >> > On Wed, Apr 1, 2009 at 2:31 PM, Jonathan Ellis <jbellis@gmail.com>
>> >> wrote:
>> >> >
>> >> >> Avinash already commited his new orderpreserving hash function
and I
>> >> >> missed it. It's in OrderPreservingHashPartitioner. It takes
the
>> >> >> approach that Todd and I discussed back in January: turn the string
>> >> >> into a baseChar.MAX_VALUE number.
>> >> >> (
>> >> >>
>> >>
>> http://groups.google.com/group/cassandradev/browse_thread/thread/6bda8518466210e7/f53b79c19010a9ed
>> >> >> ).
>> >> >> I chatted with Avinash a little on IM but we didn't finish, so
I'm
>> >> >> picking it up here.
>> >> >>
>> >> >> There are two problems with this approach. One is that the hashes
>> >> >> will only be orderpreserving for a subset of unicode (all of UCS2,
>> >> >> but not all of UTF16; see http://en.wikipedia.org/wiki/UTF16/UCS2
>> ).
>> >> >> The other is that this only gives you a naive ordering by code
point
>> >> >> value, which for unicode is not what you want and even for ascii
>> >> >> sometimes you want another collation. (see
>> >> >> http://www.unicode.org/reports/tr10/ and
>> >> >> http://java.sun.com/javase/6/docs/api/java/text/Collator.html).
>> >> >>
>> >> >> Say for instance you have inserted keys ['a', 'b', 'a', 'b']
and
>> >> >> you are going to do range queries on them. The correct
>> >> >> collationaware sort is ['a', 'a', 'b', 'b']. But ordering
by
>> >> >> char value gives ['a', 'b', 'a', 'b'].
>> >> >>
>> >> >> Switching to a more flexibile system like the one I wrote for
>> >> >> CASSANDRA3 will let use use Token<BigInteger> for random
>> distribution
>> >> >> or Token<String> for orderpreserving, with userdefined
collation.
>> I
>> >> >> don't see a way to get this kind of flexibility from an approach
that
>> >> >> insists on turning everything into BigInteger.
>> >> >>
>> >> >> Jonathan
>> >> >>
>> >> >> On Mon, Mar 30, 2009 at 2:16 PM, Jonathan Ellis <jbellis@gmail.com>
>> >> wrote:
>> >> >> > Avinash,
>> >> >> >
>> >> >> > You mentioned that you have a new orderpreserving hash function
>> that
>> >> >> > you think will be more generally useful. Can you post it?
>> >> >> >
>> >> >> > thanks,
>> >> >> >
>> >> >> > Jonathan
>> >> >> >
>> >> >>
>> >> >
>> >>
>> >
>>
>
