incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan ellis <jbel...@gmail.com>
Subject Re: OPHF
Date Sun, 05 Apr 2009 04:24:13 GMT
Note that Avinash's ophf is 1-1 up to the max key length.  But this  
has its own problems.  (Mainly, that keys cannot be arbitrarily long,  
and hash computation gets more expensive proportional to max key  
length.)

-Jonathan

On Apr 4, 2009, at 10:32 PM, Zhu Han <schumi.han@gmail.com> wrote:

> Avinash,
>
> (For cassandra, the "key" mentioned in following words is the name  
> of the
> row. The "token" mentioned in following words, is the output of the  
> hash
> function. )
>
> For example, if you use 160bit md5 hash value as the space, it's  
> finite
> theoretically, because you know the number of tokens which can be  
> contained
> ahead of time, although the number is pretty large. For finite hash  
> space,
> the mapping from key to token is many to one instead of one to one.
>
> For OHPF, a lot of keys could be mapped to  the same token range  
> because of
> the non-uniform distribution just as you pointed out. The load  
> balancing can
> solve the problem. However, if too may keys mapped to *the same token*
> which is determined by the hash function, how can the load balance be
> carried out? When the hash space is finite, it's possible that a large
> number of keys are mapped to the same token. However, if the hash  
> function
> is uniformly distributed, just as MD5 or SHA1, the probability of  
> the worse
> case is pretty rare, at least this case  cannot be produced by human
> purposely,  so that's not a problem. But for OPHF, if the clients  
> pick up
> the name of key purposely, e.g. for DOS attack, the worst case can  
> occur
> practically.  When the attacker knows the algorithm of the OPHF, he  
> can
> produce arbitrary number of keys mapped to the same token.
>
> If the hash space is infinite, or it's as large as the possible  
> space of
> keys, this would not be a problem, because the key to token mapping  
> is one
> to one instead of many to one .
>
> best regards,
> hanzhu
>
>
> On Sat, Apr 4, 2009 at 11:59 PM, Avinash Lakshman <
> avinash.lakshman@gmail.com> wrote:
>
>> Not sure what you mean by infinite. OPHF or not any hash function's  
>> range
>> is
>> practically infinite for MD5 is infinite hash range. With OPHF you  
>> will
>> have
>> a lot of skew. There are only two ways to deal with it:
>> (1) You know the space of of all your keys ahead of time and have  
>> your
>> system exploit that.
>> (2) Dynamically load balance until your system reaches steady state.
>>
>> We are always sorting lexicographically and trying to do both of  
>> the above.
>>
>> Avinash
>>
>> On Sat, Apr 4, 2009 at 12:18 AM, Zhu Han <schumi.han@gmail.com>  
>> wrote:
>>
>>> Avinash,
>>>
>>> I support the proposal of Jonathan.
>>>
>>> Theoretically,  if the system wants to support range query over
>> consistent
>>> hash(DHT),  the hash space should be infinite. Otherwise,  a large  
>>> number
>>> of
>>> keys could be mapped to the same token and  it's impossible to do  
>>> load
>>> balance by moving these keys around to other physical servers.
>> Especially,
>>> the churn of OPHF could be very high because the user would like to
>> create
>>> a
>>> lot of keys with the same long prefix for some applications. If the
>> system
>>> takes the string as the token and sorts them in *lexicographic  
>>> order, the
>>> hash space would be infinite, so that the churn can be solved by  
>>> load
>>> balacing under any circumstances.
>>>
>>> *best regards,
>>> hanzhu
>>>
>>>
>>> On Thu, Apr 2, 2009 at 11:04 PM, Avinash Lakshman <
>>> avinash.lakshman@gmail.com> wrote:
>>>
>>>> So how are you coming up with the tokens here? What do you mean by
>>>> string[0]
>>>> and string[last]? Are they the keys that belong to the system? If  
>>>> so
>> then
>>>> that is not a mechanism to reason about the range of your hash
>> function.
>>> In
>>>> this scheme does it mean you will need to sort the tokens too using
>>>> collation scheme used by the external partitioner while identifying
>> which
>>>> key goes to which node. Also could you provide me with the patch
>> number.
>>> I
>>>> need to go over that this weekend and make sure if it does not  
>>>> affect
>> the
>>>> (bootstrap/load balance) logic. My fear is that these changes  
>>>> have long
>>>> reaching effects and I need to make sure that all these pieces will
>>>> continue
>>>> to reside in harmony. Also your range query test did it run in a
>>>> distributed
>>>> environment or in a single box environment?
>>>> Avinash
>>>>
>>>> On Wed, Apr 1, 2009 at 3:34 PM, Jonathan Ellis <jbellis@gmail.com>
>>> wrote:
>>>>
>>>>> So with a String ring you just sort the strings (conceptually) and
>> you
>>>>> wrap around from strings[last] to strings[0].  The range is  
>>>>> entirely
>>>>> defined by what tokens you have.
>>>>>
>>>>> It's nice to be able to test against a real application.  Mine  
>>>>> won't
>>>>> run without range queries...  (So it is still running against my
>>>>> github code where I first wrote range queries, but that uses the  
>>>>> OPHF
>>>>> approach you have and so that is where I found the problems.)
>>>>>
>>>>> -Jonathan
>>>>>
>>>>> On Wed, Apr 1, 2009 at 4:27 PM, Avinash Lakshman
>>>>> <avinash.lakshman@gmail.com> wrote:
>>>>>> I will look into this. Another point about P2P rings is something
>>> about
>>>>> the
>>>>>> ability to reason about the system. In any hash fn we hve a  
>>>>>> notion
>> of
>>>>> range
>>>>>> and how the ring wraps around. For eg. in random hash such as MD5
>> we
>>>> say
>>>>>> 0-2^128 -1 and then the wrap around. Similarly for the hash fn  
>>>>>> used
>>>> here
>>>>> one
>>>>>> could define the same. Today we perform usual sort of strings and
>> the
>>>>> hash
>>>>>> should respect that sort order. I guess you are saying the usual
>> sort
>>>> of
>>>>>> string does not respect collation. I will look into this and I
>> insist
>>>>> this
>>>>>> should be very doable. Changes to the implementation of the core
>>>> classes
>>>>>> albeit how simple they are very very scary unless they are
>> completely
>>>>> tested
>>>>>> too in a distributed setting. Over here we do not have detailed
>> test
>>>>> code,
>>>>>> but we test by directing a % of the site traffic to a test  
>>>>>> cluster
>>>> before
>>>>> we
>>>>>> sign off on anything.
>>>>>>
>>>>>> Avinash
>>>>>>
>>>>>> On Wed, Apr 1, 2009 at 3:00 PM, Jonathan Ellis  
>>>>>> <jbellis@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>> Say for instance you have inserted keys ['a', 'b', '--a', '--b']
>> and
>>>>>>> you are going to do range queries on them.  The correct
>>>>>>> collation-aware sort is ['a', '--a', 'b', '--b'].  But  
>>>>>>> ordering by
>>>>>>> char value gives ['--a', '--b', 'a', 'b'].
>>>>>>>
>>>>>>> Attached is a test program illustrating this.  The assert will
>> fail
>>>>>>> because the hash ordering is not the same as the collator's.
>>>>>>>
>>>>>>> -Jonathan
>>>>>>>
>>>>>>> On Wed, Apr 1, 2009 at 3:47 PM, Avinash Lakshman
>>>>>>> <avinash.lakshman@gmail.com> wrote:
>>>>>>>> In fact what would you want is hash to respect oorder based
on
>> how
>>>> the
>>>>>>>> strings are sorted. For the example you have it does. So
am I
>>>> missing
>>>>>>>> something?
>>>>>>>>
>>>>>>>> Avinash
>>>>>>>>
>>>>>>>> On Wed, Apr 1, 2009 at 2:43 PM, Jonathan Ellis <
>> jbellis@gmail.com
>>>>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> For that aspect no difference between a String ring based
on
>>>>> compareTo
>>>>>>>>> and a BigInteger one.  The only difference (and it is
an
>>> important
>>>>> one
>>>>>>>>> for the reasons I gave!) is how the compare works.  But
for  
>>>>>>>>> the
>>> p2p
>>>>>>>>> aspect it does not matter.
>>>>>>>>>
>>>>>>>>> On Wed, Apr 1, 2009 at 3:40 PM, Avinash Lakshman
>>>>>>>>> <avinash.lakshman@gmail.com> wrote:
>>>>>>>>>> Doing what you are suggesting scares the hell out
of me for a
>>>>> couple
>>>>>>> of
>>>>>>>>>> reasons -  All work in P2P be it random/OPHF does
the token
>>>>> handling
>>>>>>> the
>>>>>>>>> way
>>>>>>>>>> it is setup. I cannot try something that has not
been well
>>>> explored
>>>>> in
>>>>>>>>>> academia. I insist this must be doable. I am going
to think
>>> about
>>>>> this
>>>>>>>>> more.
>>>>>>>>>>
>>>>>>>>>> Avinash
>>>>>>>>>>
>>>>>>>>>> On Wed, Apr 1, 2009 at 2:31 PM, Jonathan Ellis <
>>>> jbellis@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Avinash already commited his new order-preserving
hash
>>> function
>>>>> and I
>>>>>>>>>>> missed it.  It's in OrderPreservingHashPartitioner.
 It
>> takes
>>>> the
>>>>>>>>>>> approach that Todd and I discussed back in January:
turn the
>>>>> string
>>>>>>>>>>> into a base-Char.MAX_VALUE number.
>>>>>>>>>>> (
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>> http://groups.google.com/group/cassandra-dev/browse_thread/thread/6bda8518466210e7/f53b79c19010a9ed
>>>>>>>>>>> ).
>>>>>>>>>>> I chatted with Avinash a little on IM but we
didn't finish,
>>> so
>>>>> I'm
>>>>>>>>>>> picking it up here.
>>>>>>>>>>>
>>>>>>>>>>> There are two problems with this approach.  One
is that the
>>>> hashes
>>>>>>>>>>> will only be order-preserving for a subset of
unicode (all
>> of
>>>>> UCS-2,
>>>>>>>>>>> but not all of UTF-16; see
>>>>> http://en.wikipedia.org/wiki/UTF-16/UCS-2
>>>>>>> ).
>>>>>>>>>>> The other is that this only gives you a naive
ordering by
>>> code
>>>>> point
>>>>>>>>>>> value, which for unicode is not what you want
and even for
>>> ascii
>>>>>>>>>>> sometimes you want another collation.  (see
>>>>>>>>>>> http://www.unicode.org/reports/tr10/ and
>>>>>>>>>>>
>> http://java.sun.com/javase/6/docs/api/java/text/Collator.html
>>> ).
>>>>>>>>>>>
>>>>>>>>>>> Say for instance you have inserted keys ['a',
'b', '--a',
>>> '--b']
>>>>> and
>>>>>>>>>>> you are going to do range queries on them.  The
correct
>>>>>>>>>>> collation-aware sort is ['a', '--a', 'b', '--b'].
 But
>>> ordering
>>>> by
>>>>>>>>>>> char value gives ['--a', '--b', 'a', 'b'].
>>>>>>>>>>>
>>>>>>>>>>> Switching to a more flexibile system like the
one I wrote
>> for
>>>>>>>>>>> CASSANDRA-3 will let use use Token<BigInteger>
for random
>>>>>>> distribution
>>>>>>>>>>> or Token<String> for order-preserving,
with user-defined
>>>>> collation.
>>>>>>> I
>>>>>>>>>>> don't see a way to get this kind of flexibility
from an
>>> approach
>>>>> that
>>>>>>>>>>> insists on turning everything into BigInteger.
>>>>>>>>>>>
>>>>>>>>>>> -Jonathan
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Mar 30, 2009 at 2:16 PM, Jonathan Ellis
<
>>>>> jbellis@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>>> Avinash,
>>>>>>>>>>>>
>>>>>>>>>>>> You mentioned that you have a new order-preserving
hash
>>>> function
>>>>>>> that
>>>>>>>>>>>> you think will be more generally useful.
 Can you post it?
>>>>>>>>>>>>
>>>>>>>>>>>> thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> -Jonathan
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

Mime
View raw message