cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-13304) Add checksumming to the native protocol
Date Thu, 16 Aug 2018 14:54:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16582626#comment-16582626
] 

Ariel Weisberg commented on CASSANDRA-13304:
--------------------------------------------

bq. Is that their expectation? Or that the checksum is reliable?
I don't think we should use xxhash I think we should use CRC32. The hardware and API support
for it are excellent and it provides strong enough guarantees for transmission errors. Sure
it's not a good hash function and it's collision prone, but we aren't using it as a hash function.

bq. I ran a quick SHA1 benchmark to be sure, but it looks like it's able to hash ~1GB/s per
thread on my test machine, which should be fast enough for it to go unnoticeable 
That's not fast. That's glacially slow. And it's also not the performance you get when you
have to warm up the hash function each time you checksum a small message. Startup time matters
for hash functions used to hash many small items.

Or am I wrong in assuming you were hashing a relatively large amount of data in that benchmark?
10s of kilobytes or more hashed in a tight loop is enough to show unrealistically fast performance.

Also are you sure the implementation you are using matches the one that TLS uses? There is
hardware support for SHA-1 I believe so it might be faster.

IMO to really know if this matters someone has to measure implementations with Cassandra stress.

There is also more to this than just hash functions. Using TLS means going through SSLEngine
or whatever we use to interface with SSL. Netty might have its own SSL thing right now that
isn't just a wrapper around the Java thing. That's not necessarily free either. I don't think
we should be changing defaults without measuring impact against a cluster with at least RF=3:3
(2 DC cluster).



> Add checksumming to the native protocol
> ---------------------------------------
>
>                 Key: CASSANDRA-13304
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13304
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Michael Kjellman
>            Assignee: Sam Tunnicliffe
>            Priority: Blocker
>              Labels: client-impacting
>             Fix For: 4.x
>
>         Attachments: 13304_v1.diff, boxplot-read-throughput.png, boxplot-write-throughput.png
>
>
> The native binary transport implementation doesn't include checksums. This makes it highly
susceptible to silently inserting corrupted data either due to hardware issues causing bit
flips on the sender/client side, C*/receiver side, or network in between.
> Attaching an implementation that makes checksum'ing mandatory (assuming both client and
server know about a protocol version that supports checksums) -- and also adds checksumming
to clients that request compression.
> The serialized format looks something like this:
> {noformat}
>  *                      1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
>  *  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |  Number of Compressed Chunks  |     Compressed Length (e1)    /
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * /  Compressed Length cont. (e1) |    Uncompressed Length (e1)   /
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * | Uncompressed Length cont. (e1)| CRC32 Checksum of Lengths (e1)|
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * | Checksum of Lengths cont. (e1)|    Compressed Bytes (e1)    +//
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                      CRC32 Checksum (e1)                     ||
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                    Compressed Length (e2)                     |
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                   Uncompressed Length (e2)                    |
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                CRC32 Checksum of Lengths (e2)                 |
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                     Compressed Bytes (e2)                   +//
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                      CRC32 Checksum (e2)                     ||
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                    Compressed Length (en)                     |
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                   Uncompressed Length (en)                    |
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                CRC32 Checksum of Lengths (en)                 |
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                      Compressed Bytes (en)                  +//
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  * |                      CRC32 Checksum (en)                     ||
>  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
> {noformat}
> The first pass here adds checksums only to the actual contents of the frame body itself
(and doesn't actually checksum lengths and headers). While it would be great to fully add
checksuming across the entire protocol, the proposed implementation will ensure we at least
catch corrupted data and likely protect ourselves pretty well anyways.
> I didn't go to the trouble of implementing a Snappy Checksum'ed Compressor implementation
as it's been deprecated for a while -- is really slow and crappy compared to LZ4 -- and we
should do everything in our power to make sure no one in the community is still using it.
I left it in (for obvious backwards compatibility aspects) old for clients that don't know
about the new protocol.
> The current protocol has a 256MB (max) frame body -- where the serialized contents are
simply written in to the frame body.
> If the client sends a compression option in the startup, we will install a FrameCompressor
inline. Unfortunately, we went with a decision to treat the frame body separately from the
header bits etc in a given message. So, instead we put a compressor implementation in the
options and then if it's not null, we push the serialized bytes for the frame body *only*
thru the given FrameCompressor implementation. The existing implementations simply provide
all the bytes for the frame body in one go to the compressor implementation and then serialize
it with the length of the compressed bytes up front.
> Unfortunately, this won't work for checksum'ing for obvious reasons as we can't naively
just checksum the entire (potentially) 256MB frame body and slap it at the end... so,
> The best place to start with the changes is in {{ChecksumedCompressor}}. I implemented
one single place to perform the checksuming (and to support checksuming) the actual required
chunking logic. Implementations of ChecksumedCompressor only implement the actual calls to
the given compression algorithm for the provided bytes.
> Although the interface takes a {{Checksum}}, right now the attached patch uses CRC32
everywhere. As of right now, given JDK8+ has support for doing the calculation with the Intel
instruction set, CRC32 is about as fast as we can get right now.
> I went with a 32kb "default" for the chunk size -- meaning we will chunk the entire frame
body into 32kb chunks, compress each one of those chunks, and checksum the chunk. Upon discussing
with a bunch of people and researching how checksums actually work and how much data they
will protect etc -- if we use 32kb chunks with CRC32 we can catch up to 32 bits flipped in
a row (but more importantly catch the more likely corruption where a single bit is flipped)
with pretty high certainty. 64kb seems to introduce too much of a probability of missing corruption.
> The maximum block size LZ4 operates on is a 64kb chunk -- so this combined with the need
to make sure the CRC32 checksums are actually going to catch stuff -- chunking at 32kb seemed
like a good reasonable value to use when weighing both checksums and compression (to ensure
we don't kill our compression ratio etc).
> I'm not including client changes here -- I asked around and I'm not really sure what
the policy there is -- do we update the python driver? java driver? how has the timing of
this stuff been handled in the past?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message