cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sam Tunnicliffe (Jira)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-15299) CASSANDRA-13304 follow-up: improve checksumming and compression in protocol v5-beta
Date Thu, 21 May 2020 09:18:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-15299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112995#comment-17112995
] 

Sam Tunnicliffe commented on CASSANDRA-15299:
---------------------------------------------

When we spoke in on slack, and I'm afraid the answers I gave were only partially correct.

https://the-asf.slack.com/archives/CK23JSY2K/p1589982060404400

The good news is that your approach is right and it will correctly decode the header, but
the diagram is also correct. The disconnect is that the diagram is describing the bytes as
they are laid out on the wire at the data layer, whereas in C* and the Java Driver we're working
at a point somewhat higher up the stack. That's perhaps not as clear as it could be and I'll
definitely work on improving the docs in that regard.

To deconstruct your example, you have the bytes in the buffer with a binary representation
of
{{00000100 00000000 00000010 00000001 11111001 11110010}}

The first 3 bytes {{00000100 00000000 00000010}} are the header length + flag.
{{4 |= (1 << 17) == 131076 == 00000010 00000000 00000100}}
Writing them to the buffer in little endian ordering (in {{FrameEncodedCrc::put3b}}):
{{00000100 00000000 00000010}}
and reading them off the wire, where ethernet transmission specifies bytes are sent least
significant bit first:
{{01000000 00000000 00100000}}
so the 18th bit read off the wire is the {{1}} representing the {{selfContained}} flag, but
to extract it requires the steps you outlined including the endianness switch.

The second 3 bytes in your buffer {{00000001 11111001 11110010}} is the crc24 in little endian
order. Here though, the bit order is less important as up the stack we're dealing in whole
bytes.
So on the wire we'd see {{10000000 10011111 01001111}}, and read that as {{00000001 11111001
11110010}}. 
Then in {{FrameDecoderCrc::readHeader6b}} we change the endianness to give
{{11110010 11111001 00000001 == 15923457 == crc24(131076)}}


> CASSANDRA-13304 follow-up: improve checksumming and compression in protocol v5-beta
> -----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15299
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15299
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Messaging/Client
>            Reporter: Aleksey Yeschenko
>            Assignee: Sam Tunnicliffe
>            Priority: Normal
>              Labels: protocolv5
>             Fix For: 4.0-alpha
>
>
> CASSANDRA-13304 made an important improvement to our native protocol: it introduced checksumming/CRC32
to request and response bodies. It’s an important step forward, but it doesn’t cover the
entire stream. In particular, the message header is not covered by a checksum or a crc, which
poses a correctness issue if, for example, {{streamId}} gets corrupted.
> Additionally, we aren’t quite using CRC32 correctly, in two ways:
> 1. We are calculating the CRC32 of the *decompressed* value instead of computing the
CRC32 on the bytes written on the wire - losing the properties of the CRC32. In some cases,
due to this sequencing, attempting to decompress a corrupt stream can cause a segfault by
LZ4.
> 2. When using CRC32, the CRC32 value is written in the incorrect byte order, also losing
some of the protections.
> See https://users.ece.cmu.edu/~koopman/pubs/KoopmanCRCWebinar9May2012.pdf for explanation
for the two points above.
> Separately, there are some long-standing issues with the protocol - since *way* before
CASSANDRA-13304. Importantly, both checksumming and compression operate on individual message
bodies rather than frames of multiple complete messages. In reality, this has several important
additional downsides. To name a couple:
> # For compression, we are getting poor compression ratios for smaller messages - when
operating on tiny sequences of bytes. In reality, for most small requests and responses we
are discarding the compressed value as it’d be smaller than the uncompressed one - incurring
both redundant allocations and compressions.
> # For checksumming and CRC32 we pay a high overhead price for small messages. 4 bytes
extra is *a lot* for an empty write response, for example.
> To address the correctness issue of {{streamId}} not being covered by the checksum/CRC32
and the inefficiency in compression and checksumming/CRC32, we should switch to a framing
protocol with multiple messages in a single frame.
> I suggest we reuse the framing protocol recently implemented for internode messaging
in CASSANDRA-15066 to the extent that its logic can be borrowed, and that we do it before
native protocol v5 graduates from beta. See https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderCrc.java
and https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderLZ4.java.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message