Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id D9E9A200C3D for ; Tue, 14 Mar 2017 14:44:49 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id D85A9160B7E; Tue, 14 Mar 2017 13:44:49 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 08FF2160B7C for ; Tue, 14 Mar 2017 14:44:48 +0100 (CET) Received: (qmail 68380 invoked by uid 500); 14 Mar 2017 13:44:48 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 68363 invoked by uid 99); 14 Mar 2017 13:44:48 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Mar 2017 13:44:48 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id AD363C0BF8 for ; Tue, 14 Mar 2017 13:44:47 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.651 X-Spam-Level: X-Spam-Status: No, score=0.651 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id MzJmlmhPhirT for ; Tue, 14 Mar 2017 13:44:46 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id E52AE61EF3 for ; Tue, 14 Mar 2017 13:44:44 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id E2C69E0BCB for ; Tue, 14 Mar 2017 13:44:42 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 054D8243CE for ; Tue, 14 Mar 2017 13:44:42 +0000 (UTC) Date: Tue, 14 Mar 2017 13:44:42 +0000 (UTC) From: "Joshua McKenzie (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-13304) Add checksumming to the native protocol MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 14 Mar 2017 13:44:50 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-13304?page=3Dcom.atla= ssian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId= =3D15924213#comment-15924213 ]=20 Joshua McKenzie commented on CASSANDRA-13304: --------------------------------------------- bq. The real downside here would be potentially something like a bit gettin= g flipped that causes us to allocate a super huge buffer during deserializa= tion =E2=80=93 that's unfortunate =E2=80=93 but we would fail on the deseri= alization path and at least fail This is something we need to be better with in general (commitlog deser, ss= tables, etc) - no reason not to sanity check the value we get back from a s= ize read on metadata and gracefully warn/shutdown etc instead of OOM'ing a = node. > Add checksumming to the native protocol > --------------------------------------- > > Key: CASSANDRA-13304 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1330= 4 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Michael Kjellman > Assignee: Michael Kjellman > Labels: client-impacting > Attachments: 13304_v1.diff > > > The native binary transport implementation doesn't include checksums. Thi= s makes it highly susceptible to silently inserting corrupted data either d= ue to hardware issues causing bit flips on the sender/client side, C*/recei= ver side, or network in between. > Attaching an implementation that makes checksum'ing mandatory (assuming b= oth client and server know about a protocol version that supports checksums= ) -- and also adds checksumming to clients that request compression. > The serialized format looks something like this: > {noformat} > * 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 > * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | Number of Compressed Chunks | Compressed Length (e1) / > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * / Compressed Length cont. (e1) | Uncompressed Length (e1) / > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | Uncompressed Length cont. (e1)| CRC32 Checksum of Lengths (e1)| > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | Checksum of Lengths cont. (e1)| Compressed Bytes (e1) +// > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | CRC32 Checksum (e1) || > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | Compressed Length (e2) | > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | Uncompressed Length (e2) | > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | CRC32 Checksum of Lengths (e2) | > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | Compressed Bytes (e2) +// > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | CRC32 Checksum (e2) || > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | Compressed Length (en) | > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | Uncompressed Length (en) | > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | CRC32 Checksum of Lengths (en) | > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | Compressed Bytes (en) +// > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > * | CRC32 Checksum (en) || > * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+=20 > {noformat} > The first pass here adds checksums only to the actual contents of the fra= me body itself (and doesn't actually checksum lengths and headers). While i= t would be great to fully add checksuming across the entire protocol, the p= roposed implementation will ensure we at least catch corrupted data and lik= ely protect ourselves pretty well anyways. > I didn't go to the trouble of implementing a Snappy Checksum'ed Compresso= r implementation as it's been deprecated for a while -- is really slow and = crappy compared to LZ4 -- and we should do everything in our power to make = sure no one in the community is still using it. I left it in (for obvious b= ackwards compatibility aspects) old for clients that don't know about the n= ew protocol. > The current protocol has a 256MB (max) frame body -- where the serialized= contents are simply written in to the frame body. > If the client sends a compression option in the startup, we will install = a FrameCompressor inline. Unfortunately, we went with a decision to treat t= he frame body separately from the header bits etc in a given message. So, i= nstead we put a compressor implementation in the options and then if it's n= ot null, we push the serialized bytes for the frame body *only* thru the gi= ven FrameCompressor implementation. The existing implementations simply pro= vide all the bytes for the frame body in one go to the compressor implement= ation and then serialize it with the length of the compressed bytes up fron= t. > Unfortunately, this won't work for checksum'ing for obvious reasons as we= can't naively just checksum the entire (potentially) 256MB frame body and = slap it at the end... so, > The best place to start with the changes is in {{ChecksumedCompressor}}. = I implemented one single place to perform the checksuming (and to support c= hecksuming) the actual required chunking logic. Implementations of Checksum= edCompressor only implement the actual calls to the given compression algor= ithm for the provided bytes. > Although the interface takes a {{Checksum}}, right now the attached patch= uses CRC32 everywhere. As of right now, given JDK8+ has support for doing = the calculation with the Intel instruction set, CRC32 is about as fast as w= e can get right now. > I went with a 32kb "default" for the chunk size -- meaning we will chunk = the entire frame body into 32kb chunks, compress each one of those chunks, = and checksum the chunk. Upon discussing with a bunch of people and research= ing how checksums actually work and how much data they will protect etc -- = if we use 32kb chunks with CRC32 we can catch up to 32 bits flipped in a ro= w (but more importantly catch the more likely corruption where a single bit= is flipped) with pretty high certainty. 64kb seems to introduce too much o= f a probability of missing corruption. > The maximum block size LZ4 operates on is a 64kb chunk -- so this combine= d with the need to make sure the CRC32 checksums are actually going to catc= h stuff -- chunking at 32kb seemed like a good reasonable value to use when= weighing both checksums and compression (to ensure we don't kill our compr= ession ratio etc). > I'm not including client changes here -- I asked around and I'm not reall= y sure what the policy there is -- do we update the python driver? java dri= ver? how has the timing of this stuff been handled in the past? -- This message was sent by Atlassian JIRA (v6.3.15#6346)