cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-9264) Cassandra should not persist files without checksums
Date Sat, 22 Aug 2015 02:50:46 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707490#comment-14707490
] 

Ariel Weisberg edited comment on CASSANDRA-9264 at 8/22/15 2:50 AM:
--------------------------------------------------------------------

Non-compressed SSTables do have block level checksums I believe. However they are never checked
during reads. We would have to always do 64k IOs to validate checksums which would be a disadvantage
for small random reads if everything were fast enough to saturate an array of SSDs with IOs.

There are other issues with checksums. The checksums are stored in a separate file so it's
actually two IOs for each read until the checksums are cached (or if they are evicted). I
suspect in practice with read ahead the caching for checksums works fine. I did the math and
a terabyte of checksums for 64k blocks is only a few megabytes so that shouldn't be an issue.

In the compressed case there is a third lookup table for compressed block offsets which presents
the same issue as the sidecar checksums.

Circling back on topic my real beef is that other metadata files don't have checksums. I think
we should get to the point that reading and writing checksummed data is completely transparent
and feels like reading/writing a regular file so we don't have to think about it.

I don't see a need for sidecar checksum files at all. The sidecar offsets file for compressed
files make sense to me.


was (Author: aweisberg):
Non-compressed SSTables do have block level checksums I believe. However they are never checked
during reads. We would have to always do 64k IOs to validate checksums which would be a disadvantage
for small random reads if everything were fast enough to saturate an arrays of SSDs with IOs.

There are other issues with checksums. The checksums are stored in a separate file so it's
actually two IOs for each read until the checksums are cached (or if they are evicted). I
suspect in practice with read ahead the caching for checksums works fine. I did the math and
a terabyte of checksums for 64k blocks is only a few megabytes so that shouldn't be an issue.

In the compressed case there is a third lookup table for compressed block offsets which presents
the same issue as the sidecar checksums.

Circling back on topic my real beef is that other metadata files don't have checksums. I think
we should get to the point that reading and writing checksummed data is completely transparent
and feels like reading/writing a regular file so we don't have to think about it.

I don't see a need for sidecar checksum files at all. The sidecar offsets file for compressed
files make sense to me.

> Cassandra should not persist files without checksums
> ----------------------------------------------------
>
>                 Key: CASSANDRA-9264
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9264
>             Project: Cassandra
>          Issue Type: Wish
>            Reporter: Ariel Weisberg
>             Fix For: 3.x
>
>
> Even if checksums aren't validated on the read side every time it is helpful to have
them persisted with checksums so that if a corrupted file is encountered you can at least
validate that the issue is corruption and not an application level error that generated a
corrupt file.
> We should standardize on conventions for how to checksum a file and which checksums to
use so we can ensure we get the best performance possible.
> For a small checksum I think we should use CRC32 because the hardware support appears
quite good.
> For cases where a 4-byte checksum is not enough I think we can look at either xxhash64
or MurmurHash3.
> The problem with xxhash64 is that output is only 8-bytes. The problem with MurmurHash3
is that the Java implementation is slow. If we can live with 8-bytes and make it easy to switch
hash implementations I think xxhash64 is a good choice because we already ship a good implementation
with LZ4.
> I would also like to see hashes always prefixed by a type so that we can swap hashes
without running into pain trying to figure out what hash implementation is present. I would
also like to avoid making assumptions about the number of bytes in a hash field where possible
keeping in mind compatibility and space issues.
> Hashing after compression is also desirable over hashing before compression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message