cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-9264) Cassandra should not persist files without checksums
Date Wed, 29 Apr 2015 17:35:06 GMT
Ariel Weisberg created CASSANDRA-9264:
-----------------------------------------

             Summary: Cassandra should not persist files without checksums
                 Key: CASSANDRA-9264
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9264
             Project: Cassandra
          Issue Type: Wish
            Reporter: Ariel Weisberg
             Fix For: 3.x


Even if checksums aren't validated on the read side every time it is helpful to have them
persisted with checksums so that if a corrupted file is encountered you can at least validate
that the issue is corruption and not an application level error that generated a corrupt file.

We should standardize on conventions for how to checksum a file and which checksums to use
so we can ensure we get the best performance possible.

For a small checksum I think we should use CRC32 because the hardware support appears quite
good.

For cases where a 4-byte checksum is not enough I think we can look at either xxhash64 or
MurmurHash3.

The problem with xxhash64 is that output is only 8-bytes. The problem with MurmurHash3 is
that the Java implementation is slow. If we can live with 8-bytes and make it easy to switch
hash implementations I think xxhash64 is a good choice because we already ship a good implementation
with LZ4.

I would also like to see hashes always prefixed by a type so that we can swap hashes without
running into pain trying to figure out what hash implementation is present. I would also like
to avoid making assumptions about the number of bytes in a hash field where possible keeping
in mind compatibility and space issues.

Hashing after compression is also desirable over hashing before compression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message