incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avinash Lakshman <avinash.laksh...@gmail.com>
Subject Re: OPHF
Date Sat, 04 Apr 2009 15:59:57 GMT
Not sure what you mean by infinite. OPHF or not any hash function's range is
practically infinite for MD5 is infinite hash range. With OPHF you will have
a lot of skew. There are only two ways to deal with it:
(1) You know the space of of all your keys ahead of time and have your
system exploit that.
(2) Dynamically load balance until your system reaches steady state.

We are always sorting lexicographically and trying to do both of the above.

Avinash

On Sat, Apr 4, 2009 at 12:18 AM, Zhu Han <schumi.han@gmail.com> wrote:

> Avinash,
>
> I support the proposal of Jonathan.
>
> Theoretically,  if the system wants to support range query over consistent
> hash(DHT),  the hash space should be infinite. Otherwise,  a large number
> of
> keys could be mapped to the same token and  it's impossible to do load
> balance by moving these keys around to other physical servers.  Especially,
> the churn of OPHF could be very high because the user would like to create
> a
> lot of keys with the same long prefix for some applications. If the system
> takes the string as the token and sorts them in *lexicographic order, the
> hash space would be infinite, so that the churn can be solved by load
> balacing under any circumstances.
>
> *best regards,
> hanzhu
>
>
> On Thu, Apr 2, 2009 at 11:04 PM, Avinash Lakshman <
> avinash.lakshman@gmail.com> wrote:
>
> > So how are you coming up with the tokens here? What do you mean by
> > string[0]
> > and string[last]? Are they the keys that belong to the system? If so then
> > that is not a mechanism to reason about the range of your hash function.
> In
> > this scheme does it mean you will need to sort the tokens too using
> > collation scheme used by the external partitioner while identifying which
> > key goes to which node. Also could you provide me with the patch number.
> I
> > need to go over that this weekend and make sure if it does not affect the
> > (bootstrap/load balance) logic. My fear is that these changes have long
> > reaching effects and I need to make sure that all these pieces will
> > continue
> > to reside in harmony. Also your range query test did it run in a
> > distributed
> > environment or in a single box environment?
> > Avinash
> >
> > On Wed, Apr 1, 2009 at 3:34 PM, Jonathan Ellis <jbellis@gmail.com>
> wrote:
> >
> > > So with a String ring you just sort the strings (conceptually) and you
> > > wrap around from strings[last] to strings[0].  The range is entirely
> > > defined by what tokens you have.
> > >
> > > It's nice to be able to test against a real application.  Mine won't
> > > run without range queries...  (So it is still running against my
> > > github code where I first wrote range queries, but that uses the OPHF
> > > approach you have and so that is where I found the problems.)
> > >
> > > -Jonathan
> > >
> > > On Wed, Apr 1, 2009 at 4:27 PM, Avinash Lakshman
> > > <avinash.lakshman@gmail.com> wrote:
> > > > I will look into this. Another point about P2P rings is something
> about
> > > the
> > > > ability to reason about the system. In any hash fn we hve a notion of
> > > range
> > > > and how the ring wraps around. For eg. in random hash such as MD5 we
> > say
> > > > 0-2^128 -1 and then the wrap around. Similarly for the hash fn used
> > here
> > > one
> > > > could define the same. Today we perform usual sort of strings and the
> > > hash
> > > > should respect that sort order. I guess you are saying the usual sort
> > of
> > > > string does not respect collation. I will look into this and I insist
> > > this
> > > > should be very doable. Changes to the implementation of the core
> > classes
> > > > albeit how simple they are very very scary unless they are completely
> > > tested
> > > > too in a distributed setting. Over here we do not have detailed test
> > > code,
> > > > but we test by directing a % of the site traffic to a test cluster
> > before
> > > we
> > > > sign off on anything.
> > > >
> > > > Avinash
> > > >
> > > > On Wed, Apr 1, 2009 at 3:00 PM, Jonathan Ellis <jbellis@gmail.com>
> > > wrote:
> > > >
> > > >> Say for instance you have inserted keys ['a', 'b', '--a', '--b'] and
> > > >> you are going to do range queries on them.  The correct
> > > >> collation-aware sort is ['a', '--a', 'b', '--b'].  But ordering by
> > > >> char value gives ['--a', '--b', 'a', 'b'].
> > > >>
> > > >> Attached is a test program illustrating this.  The assert will fail
> > > >> because the hash ordering is not the same as the collator's.
> > > >>
> > > >> -Jonathan
> > > >>
> > > >> On Wed, Apr 1, 2009 at 3:47 PM, Avinash Lakshman
> > > >> <avinash.lakshman@gmail.com> wrote:
> > > >> > In fact what would you want is hash to respect oorder based on
how
> > the
> > > >> > strings are sorted. For the example you have it does. So am I
> > missing
> > > >> > something?
> > > >> >
> > > >> > Avinash
> > > >> >
> > > >> > On Wed, Apr 1, 2009 at 2:43 PM, Jonathan Ellis <jbellis@gmail.com
> >
> > > >> wrote:
> > > >> >
> > > >> >> For that aspect no difference between a String ring based
on
> > > compareTo
> > > >> >> and a BigInteger one.  The only difference (and it is an
> important
> > > one
> > > >> >> for the reasons I gave!) is how the compare works.  But for
the
> p2p
> > > >> >> aspect it does not matter.
> > > >> >>
> > > >> >> On Wed, Apr 1, 2009 at 3:40 PM, Avinash Lakshman
> > > >> >> <avinash.lakshman@gmail.com> wrote:
> > > >> >> > Doing what you are suggesting scares the hell out of
me for a
> > > couple
> > > >> of
> > > >> >> > reasons -  All work in P2P be it random/OPHF does the
token
> > > handling
> > > >> the
> > > >> >> way
> > > >> >> > it is setup. I cannot try something that has not been
well
> > explored
> > > in
> > > >> >> > academia. I insist this must be doable. I am going to
think
> about
> > > this
> > > >> >> more.
> > > >> >> >
> > > >> >> > Avinash
> > > >> >> >
> > > >> >> > On Wed, Apr 1, 2009 at 2:31 PM, Jonathan Ellis <
> > jbellis@gmail.com>
> > > >> >> wrote:
> > > >> >> >
> > > >> >> >> Avinash already commited his new order-preserving
hash
> function
> > > and I
> > > >> >> >> missed it.  It's in OrderPreservingHashPartitioner.
 It takes
> > the
> > > >> >> >> approach that Todd and I discussed back in January:
turn the
> > > string
> > > >> >> >> into a base-Char.MAX_VALUE number.
> > > >> >> >> (
> > > >> >> >>
> > > >> >>
> > > >>
> > >
> >
> http://groups.google.com/group/cassandra-dev/browse_thread/thread/6bda8518466210e7/f53b79c19010a9ed
> > > >> >> >> ).
> > > >> >> >>  I chatted with Avinash a little on IM but we didn't
finish,
> so
> > > I'm
> > > >> >> >> picking it up here.
> > > >> >> >>
> > > >> >> >> There are two problems with this approach.  One
is that the
> > hashes
> > > >> >> >> will only be order-preserving for a subset of unicode
(all of
> > > UCS-2,
> > > >> >> >> but not all of UTF-16; see
> > > http://en.wikipedia.org/wiki/UTF-16/UCS-2
> > > >> ).
> > > >> >> >>  The other is that this only gives you a naive ordering
by
> code
> > > point
> > > >> >> >> value, which for unicode is not what you want and
even for
> ascii
> > > >> >> >> sometimes you want another collation.  (see
> > > >> >> >> http://www.unicode.org/reports/tr10/ and
> > > >> >> >> http://java.sun.com/javase/6/docs/api/java/text/Collator.html
> ).
> > > >> >> >>
> > > >> >> >> Say for instance you have inserted keys ['a', 'b',
'--a',
> '--b']
> > > and
> > > >> >> >> you are going to do range queries on them.  The
correct
> > > >> >> >> collation-aware sort is ['a', '--a', 'b', '--b'].
 But
> ordering
> > by
> > > >> >> >> char value gives ['--a', '--b', 'a', 'b'].
> > > >> >> >>
> > > >> >> >> Switching to a more flexibile system like the one
I wrote for
> > > >> >> >> CASSANDRA-3 will let use use Token<BigInteger>
for random
> > > >> distribution
> > > >> >> >> or Token<String> for order-preserving, with
user-defined
> > > collation.
> > > >>  I
> > > >> >> >> don't see a way to get this kind of flexibility
from an
> approach
> > > that
> > > >> >> >> insists on turning everything into BigInteger.
> > > >> >> >>
> > > >> >> >> -Jonathan
> > > >> >> >>
> > > >> >> >> On Mon, Mar 30, 2009 at 2:16 PM, Jonathan Ellis
<
> > > jbellis@gmail.com>
> > > >> >> wrote:
> > > >> >> >> > Avinash,
> > > >> >> >> >
> > > >> >> >> > You mentioned that you have a new order-preserving
hash
> > function
> > > >> that
> > > >> >> >> > you think will be more generally useful.  Can
you post it?
> > > >> >> >> >
> > > >> >> >> > thanks,
> > > >> >> >> >
> > > >> >> >> > -Jonathan
> > > >> >> >> >
> > > >> >> >>
> > > >> >> >
> > > >> >>
> > > >> >
> > > >>
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message