Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Tue, 14 Mar 2017 13:44:42 +0000 (UTC)
From: "Joshua McKenzie (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <JIRA.13048692.1488843627000.14316.1489499082020@Atlassian.JIRA>
In-Reply-To: <JIRA.13048692.1488843627000@Atlassian.JIRA>
References: <JIRA.13048692.1488843627000@Atlassian.JIRA> <JIRA.13048692.1488843627902@jira-lw-us.apache.org>
Subject: [jira] [Commented] (CASSANDRA-13304) Add checksumming to the native
 protocol
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
archived-at: Tue, 14 Mar 2017 13:44:50 -0000


    [ https://issues.apache.org/jira/browse/CASSANDRA-13304?page=3Dcom.atla=
ssian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=
=3D15924213#comment-15924213 ]=20

Joshua McKenzie commented on CASSANDRA-13304:
---------------------------------------------

bq. The real downside here would be potentially something like a bit gettin=
g flipped that causes us to allocate a super huge buffer during deserializa=
tion =E2=80=93 that's unfortunate =E2=80=93 but we would fail on the deseri=
alization path and at least fail

This is something we need to be better with in general (commitlog deser, ss=
tables, etc) - no reason not to sanity check the value we get back from a s=
ize read on metadata and gracefully warn/shutdown etc instead of OOM'ing a =
node.

> Add checksumming to the native protocol
> ---------------------------------------
>
>                 Key: CASSANDRA-13304
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1330=
4
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Michael Kjellman
>            Assignee: Michael Kjellman
>              Labels: client-impacting
>         Attachments: 13304_v1.diff
>
>
> The native binary transport implementation doesn't include checksums. Thi=
s makes it highly susceptible to silently inserting corrupted data either d=
ue to hardware issues causing bit flips on the sender/client side, C*/recei=
ver side, or network in between.
> Attaching an implementation that makes checksum'ing mandatory (assuming b=
oth client and server know about a protocol version that supports checksums=
) -- and also adds checksumming to clients that request compression.
> The serialized format looks something like this:
> {noformat}
>  *                      1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
>  *  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |  Number of Compressed Chunks  |     Compressed Length (e1)    /
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * /  Compressed Length cont. (e1) |    Uncompressed Length (e1)   /
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * | Uncompressed Length cont. (e1)| CRC32 Checksum of Lengths (e1)|
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * | Checksum of Lengths cont. (e1)|    Compressed Bytes (e1)    +//
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                      CRC32 Checksum (e1)                     ||
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                    Compressed Length (e2)                     |
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                   Uncompressed Length (e2)                    |
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                CRC32 Checksum of Lengths (e2)                 |
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                     Compressed Bytes (e2)                   +//
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                      CRC32 Checksum (e2)                     ||
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                    Compressed Length (en)                     |
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                   Uncompressed Length (en)                    |
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                CRC32 Checksum of Lengths (en)                 |
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                      Compressed Bytes (en)                  +//
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                      CRC32 Checksum (en)                     ||
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+=20
> {noformat}
> The first pass here adds checksums only to the actual contents of the fra=
me body itself (and doesn't actually checksum lengths and headers). While i=
t would be great to fully add checksuming across the entire protocol, the p=
roposed implementation will ensure we at least catch corrupted data and lik=
ely protect ourselves pretty well anyways.
> I didn't go to the trouble of implementing a Snappy Checksum'ed Compresso=
r implementation as it's been deprecated for a while -- is really slow and =
crappy compared to LZ4 -- and we should do everything in our power to make =
sure no one in the community is still using it. I left it in (for obvious b=
ackwards compatibility aspects) old for clients that don't know about the n=
ew protocol.
> The current protocol has a 256MB (max) frame body -- where the serialized=
 contents are simply written in to the frame body.
> If the client sends a compression option in the startup, we will install =
a FrameCompressor inline. Unfortunately, we went with a decision to treat t=
he frame body separately from the header bits etc in a given message. So, i=
nstead we put a compressor implementation in the options and then if it's n=
ot null, we push the serialized bytes for the frame body *only* thru the gi=
ven FrameCompressor implementation. The existing implementations simply pro=
vide all the bytes for the frame body in one go to the compressor implement=
ation and then serialize it with the length of the compressed bytes up fron=
t.
> Unfortunately, this won't work for checksum'ing for obvious reasons as we=
 can't naively just checksum the entire (potentially) 256MB frame body and =
slap it at the end... so,
> The best place to start with the changes is in {{ChecksumedCompressor}}. =
I implemented one single place to perform the checksuming (and to support c=
hecksuming) the actual required chunking logic. Implementations of Checksum=
edCompressor only implement the actual calls to the given compression algor=
ithm for the provided bytes.
> Although the interface takes a {{Checksum}}, right now the attached patch=
 uses CRC32 everywhere. As of right now, given JDK8+ has support for doing =
the calculation with the Intel instruction set, CRC32 is about as fast as w=
e can get right now.
> I went with a 32kb "default" for the chunk size -- meaning we will chunk =
the entire frame body into 32kb chunks, compress each one of those chunks, =
and checksum the chunk. Upon discussing with a bunch of people and research=
ing how checksums actually work and how much data they will protect etc -- =
if we use 32kb chunks with CRC32 we can catch up to 32 bits flipped in a ro=
w (but more importantly catch the more likely corruption where a single bit=
 is flipped) with pretty high certainty. 64kb seems to introduce too much o=
f a probability of missing corruption.
> The maximum block size LZ4 operates on is a 64kb chunk -- so this combine=
d with the need to make sure the CRC32 checksums are actually going to catc=
h stuff -- chunking at 32kb seemed like a good reasonable value to use when=
 weighing both checksums and compression (to ensure we don't kill our compr=
ession ratio etc).
> I'm not including client changes here -- I asked around and I'm not reall=
y sure what the policy there is -- do we update the python driver? java dri=
ver? how has the timing of this stuff been handled in the past?


--
This message was sent by Atlassian JIRA
(v6.3.15#6346)