From commits-return-241797-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Fri Nov 13 15:23:03 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mxout1-ec2-va.apache.org (mxout1-ec2-va.apache.org [3.227.148.255]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id D350D180658 for ; Fri, 13 Nov 2020 16:23:02 +0100 (CET) Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mxout1-ec2-va.apache.org (ASF Mail Server at mxout1-ec2-va.apache.org) with SMTP id 1E1AA4AE99 for ; Fri, 13 Nov 2020 15:23:02 +0000 (UTC) Received: (qmail 97903 invoked by uid 500); 13 Nov 2020 15:23:01 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 97872 invoked by uid 99); 13 Nov 2020 15:23:01 -0000 Received: from mailrelay1-he-de.apache.org (HELO mailrelay1-he-de.apache.org) (116.203.21.61) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Nov 2020 15:23:01 +0000 Received: from jira2-he-de.apache.org (unknown [IPv6:2a01:4f8:242:1f49::2]) by mailrelay1-he-de.apache.org (ASF Mail Server at mailrelay1-he-de.apache.org) with ESMTPS id 85BB540555 for ; Fri, 13 Nov 2020 15:23:00 +0000 (UTC) Received: from jira2-he-de.apache.org (localhost.localdomain [127.0.0.1]) by jira2-he-de.apache.org (ASF Mail Server at jira2-he-de.apache.org) with ESMTP id 3921BC80263 for ; Fri, 13 Nov 2020 15:23:00 +0000 (UTC) Date: Fri, 13 Nov 2020 15:23:00 +0000 (UTC) From: "Sam Tunnicliffe (Jira)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-15299) CASSANDRA-13304 follow-up: improve checksumming and compression in protocol v5-beta MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-15299?page=3Dcom.atla= ssian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId= =3D17231559#comment-17231559 ]=20 Sam Tunnicliffe commented on CASSANDRA-15299: --------------------------------------------- [~aholmber] that's great, thanks! I'll make sure everything's passing again= st trunk with the latest driver and get back to you.=20 > CASSANDRA-13304 follow-up: improve checksumming and compression in protoc= ol v5-beta > -------------------------------------------------------------------------= ---------- > > Key: CASSANDRA-15299 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1529= 9 > Project: Cassandra > Issue Type: Improvement > Components: Messaging/Client > Reporter: Aleksey Yeschenko > Assignee: Sam Tunnicliffe > Priority: Normal > Labels: protocolv5 > Fix For: 4.0-alpha > > Attachments: Process CQL Frame.png, V5 Flow Chart.png > > > CASSANDRA-13304 made an important improvement to our native protocol: it = introduced checksumming/CRC32 to request and response bodies. It=E2=80=99s = an important step forward, but it doesn=E2=80=99t cover the entire stream. = In particular, the message header is not covered by a checksum or a crc, wh= ich poses a correctness issue if, for example, {{streamId}} gets corrupted. > Additionally, we aren=E2=80=99t quite using CRC32 correctly, in two ways: > 1. We are calculating the CRC32 of the *decompressed* value instead of co= mputing the CRC32 on the bytes written on the wire - losing the properties = of the CRC32. In some cases, due to this sequencing, attempting to decompre= ss a corrupt stream can cause a segfault by LZ4. > 2. When using CRC32, the CRC32 value is written in the incorrect byte ord= er, also losing some of the protections. > See https://users.ece.cmu.edu/~koopman/pubs/KoopmanCRCWebinar9May2012.pdf= for explanation for the two points above. > Separately, there are some long-standing issues with the protocol - since= *way* before CASSANDRA-13304. Importantly, both checksumming and compressi= on operate on individual message bodies rather than frames of multiple comp= lete messages. In reality, this has several important additional downsides.= To name a couple: > # For compression, we are getting poor compression ratios for smaller mes= sages - when operating on tiny sequences of bytes. In reality, for most sma= ll requests and responses we are discarding the compressed value as it=E2= =80=99d be smaller than the uncompressed one - incurring both redundant all= ocations and compressions. > # For checksumming and CRC32 we pay a high overhead price for small messa= ges. 4 bytes extra is *a lot* for an empty write response, for example. > To address the correctness issue of {{streamId}} not being covered by the= checksum/CRC32 and the inefficiency in compression and checksumming/CRC32,= we should switch to a framing protocol with multiple messages in a single = frame. > I suggest we reuse the framing protocol recently implemented for internod= e messaging in CASSANDRA-15066 to the extent that its logic can be borrowed= , and that we do it before native protocol v5 graduates from beta. See http= s://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/ne= t/FrameDecoderCrc.java and https://github.com/apache/cassandra/blob/trunk/s= rc/java/org/apache/cassandra/net/FrameDecoderLZ4.java. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org For additional commands, e-mail: commits-help@cassandra.apache.org