cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (Commented) (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-1034) Remove assumption that Key to Token is one-to-one
Date Wed, 12 Oct 2011 10:43:13 GMT


Sylvain Lebresne commented on CASSANDRA-1034:

I'm not sure I'm am completely clear, so allow me to try to improve that. I think there is
two issues with the last patch that are largely orthogonal:
  # the patch is broken (again, this is largely not related to the shoving of DK into Token)
  # I believe shoving DK into Token is a bad, error-prone idea that have no tangible advantages

But let's me first focus on the first issue, if only because it involves no opinion whatsoever:
I'm either right or wrong that it's broken (but i'm pretty sure I'm right). So let's be concrete
and take an example.

The patch "merges" Token and DecoratedKey together, so let me take the following notation:
  * t(1, a) for what is the DecoratedKey a of token 1 in current but is is just a Token instance
in the patch
  * t(1) for the 'pure' token 1. In other word it's a shortcut for t(1, EMPTY_BYTE_BUFFER)
in the attached patch and correspond to just a Token in the current code.
(as a side note, the fact that I have to speak of DecoratedKey and 'pure' token to explain
is imo yet another sign than melting everything into Token is a bad idea but I'm diverging)

Since when Token are compared, the token is compared then the key is on token equality, we
have t\(n) < t(n, k) whatever the token n and key k are (since t\(n) is t(n, EMPTY_BYTE_BUFFER)
and EMPTY_BYTE_BUFFER < k for any valid key k) .

Let's now take an example of multiple keys having the same token and say that I have the following
keys in my cluster:
tokens |   1   |     2     |   3
keys   | a | b | c | d | e | f | g
In other words, a and b have the same token 1; c, d and e have the same token 2; ...

The goal for this ticket is to support that situation correctly. Sor for instance, we should
have that:
   range_slice(start=t(1), end=t(3)) returns c, d, e, f and g
(because range_slice with tokens is start exclusive).  However, with the attached patch:
   range_slice(start=t(1), end=t(3)) will return a, b, c, d and e

The reason is fairly simple: we have that t(1) < t(1, k) for any k and t(3) < t(3, k)
for any k.

Another way to put it is that it breaks our token ranges: if you have a node that owns Range(t(1),
t(3)), it's supposed to not contains any key with token 1 and all keys with token 3, but it
fails at both.

So it's broken. Now there is something we could be tempted to do. We could make it so that
t\(n) > t(n, k) for any token n and any key k. But in turn that would break Bounds (i.e,
start inclusive) of 'pure' tokens. I.e, Bounds(t(1), t(2)) is supposed to include all keys
with token 1, but if t(1) > t(1, k) for any key k, it won't include it.

One could argue however that this is still solution because I *think* that right now we never
really use a Bounds of 'pure' tokens (more precisely, the current code does it, but only in
place where we are actually doing a range slice between keys). And I *think* that functions
that take a startKey, when fed a 'pure' token only do start exclusive. So I suppose we could
assert that we never create Bounds of Token and put some assert here and there (in SSTableReader.getPosition()
for instance) and go with that.

But imho this is a bad idea. Because it's fragile and because this is ignoring a problem that
may screw us up later. Why not fix it the right way now? What if tomorrow we do want to be
able to query all the keys having a given token ? That is, what if we want to query [t(1),
t(1)] ? We would not be able to, because if t(1) > t(1, k) for any k, then [t(1), t(1)]
don't include anything.

Again, all this is because a token actually correspond to a set of keys (once you acknowledge
multiple keys can have the same token), and so if you want to do things correctly, you need
for a given token n to have a representation for both:
  * the smallest key having token n
  * the greatest key having token n

With that, you can query all the keys having token n. Without, you can't. That is what my
patch does and I believe fairly strongly is the right way to do it.

Alright, that the first thing that a patch to this ticket must deal with. Then there is another
thing: the current code only allow for AbstractBounds of Token (by typing), but we want for
this patch that once you do a range_slice query with at startKey and endKey, you get a range
of keys in ColumnFamilyStore.getRangeSlice(), so that you can precisely answer those queries.
That means we must be able to construct AbstractBounds with keys in them. Note that it's basically
just a typing problem.

The answer to that of this patch is show DK into Token. So yeah, it fixes that problem, but
what I'm arguing is that:
  * It's error-prone and make coding *more* complicated. We're merging object that are not
the same thing. Today if a methods takes a Token, you know it won't do anything at the granularity
of keys (well today Token and keys have the same granularity but this ticket is supposed to
change that). You lose that information if you merge DK and Token. And if a method takes a
DecoratedKey, you know that it doesn't expect a Token (more precisely, Typing ensures it).
Sure, we do already use a trick in a few places where we create 'fake' DK(null, key). But
at the very least, when we do that, we know we're doing something weird, and we are extra
careful that methods we call on that fake DK handle that case correctly. If everything is
Token, now the signature for a lot of method will suggest it is ok to give a 'pure' Token.
So what, all method should defensively assert this is not the case ? This is what types are
  * It's a ~300K patch. Sure it's mostly simple changes, but it's still that many changes
that could introduce a typo somewhere that causes a bug.
  * It's a tad less efficient because each time we really only care about 'pure' Token (and
there is quite a bunch of code that does that), we would allocate a slightly larger structure
for no reason. And I'm pretty sure god kills a kitten each time you do that.

The solution to that very same type problem I'm proposing (in my patch) is instead simply
to generalize AbstractBound slightly so you can have both AbstractBound of Token and of DecoratedKey.
That sound very reasonable to me.  After all we should be able to have AbstractBounds of anything
that implements Comparable right ? Well, as it turns out our implementation of AbstractBound
needs a little more than that (because our ranges wraps, we need Comparable but with a minimum
value for instance) and that is what RingPosition is for.  But it's only a marker interface,
and if you look at the code it's actually used in a small number of places, so I admit I fail
to see how this make thing much more complicated.
> Remove assumption that Key to Token is one-to-one
> -------------------------------------------------
>                 Key: CASSANDRA-1034
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stu Hood
>            Assignee: Pavel Yaskevich
>            Priority: Minor
>             Fix For: 1.1
>         Attachments: 0001-Make-range-accept-both-Token-and-DecoratedKey.patch, 0002-LengthPartitioner.patch,
1034-1-Generify-AbstractBounds-v3.patch, 1034-2-Remove-assumption-that-token-and-keys-are-one-to-one-v3.patch,
1034_v1.txt, CASSANDRA-1034.patch
> get_range_slices assumes that Tokens do not collide and converts a KeyRange to an AbstractBounds.
For RandomPartitioner, this assumption isn't safe, and would lead to a very weird heisenberg.
> Converting AbstractBounds to use a DecoratedKey would solve this, because the byte[]
key portion of the DecoratedKey can act as a tiebreaker. Alternatively, we could make DecoratedKey
extend Token, and then use DecoratedKeys in places where collisions are unacceptable.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message