cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Williams (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-4383) Binary encoding of vnode tokens
Date Mon, 30 Jul 2012 18:41:37 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Brandon Williams updated CASSANDRA-4383:
----------------------------------------

    Attachment: 4383-v1.txt

Attached is a first draft.  Some notes:

* we have to keep the first token as text for backward-compatibility, but after that we can
encode the rest in binary.

* simply using binary is good enough, since CASSANDRA-4139 will give us varint encoding for
free, and beyond that CASSANDRA-3127 gives us compression.

I say first draft, because while everything works with this patch, during the implementation
I found some things that, while perhaps beyond the original scope of this ticket, are things
we should probably address:

* the logic for serialization and deserialization is split between VV and SS, which is kind
of ugly, and we probably need to deser for gossipinfo to both make it useful and avoid annoying
flashing the terminal.  These are things I could have done, though, I just haven't yet.

* The game of appending things to STATUS and carefully splitting to avoid accidentally tripping
over the VV delimiter is both something I'd like to stop doing, and slightly dangerous.

* Since VV uses strings, we have to use the latin-1 codepage to pass the binary tokens to
avoid having any bytes eaten by string encoding.  This is a bit hackish.

To solve the STATUS and pieces[] problem, I suggest we stop appending things to it right now.
 Currently LEAVING is the one-off where HOST_ID is NOT included, and there's nothing we can
do about that while maintaining compatibility.  So what I suggest is we make  that the norm,
and promote HOST_ID to a new ApplicationState, which will simplify the "do I need to look
for a hostId?" checks since the state will be guaranteed to be there for new-style nodes.
 Similarly, I think we should promote the serialized tokens to a TOKENS ApplicationState,
so we can stop deftly avoiding tripping over our string delimiter there.  Old-style nodes
will still do the the split on STATUS and we'll keep putting the first token there for that,
but new-style nodes can process TOKENS directly and safely.

Finally, to avoid the latin-1 hack, we should probably think about converting VV to accepting
and writing bytes directly.

Thoughts?
                
> Binary encoding of vnode tokens
> -------------------------------
>
>                 Key: CASSANDRA-4383
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4383
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>             Fix For: 1.2
>
>         Attachments: 4383-v1.txt
>
>
> Since after CASSANDRA-4317 we can know which version a remote node is using (that is,
whether it is vnode-aware or not) this a good opportunity to change the token encoding to
binary, since with a default of 256 tokens per node even a fixed-length 16 byte encoding per
token provides a great deal of savings in gossip traffic over a text representation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message