cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-9264) Cassandra should not persist files without checksums
Date Wed, 29 Apr 2015 20:55:08 GMT


Ariel Weisberg commented on CASSANDRA-9264:

For small metadata files that would be perfect. If you happen to know what things aren't currently
covered by a checksum that would help. I haven't gone through and checked yet and I bet I
will miss some even if I do go through.

Some file deserve more consideration like emitting per record checksums so that they can be
incrementally validated/invalidated.

> Cassandra should not persist files without checksums
> ----------------------------------------------------
>                 Key: CASSANDRA-9264
>                 URL:
>             Project: Cassandra
>          Issue Type: Wish
>            Reporter: Ariel Weisberg
>             Fix For: 3.x
> Even if checksums aren't validated on the read side every time it is helpful to have
them persisted with checksums so that if a corrupted file is encountered you can at least
validate that the issue is corruption and not an application level error that generated a
corrupt file.
> We should standardize on conventions for how to checksum a file and which checksums to
use so we can ensure we get the best performance possible.
> For a small checksum I think we should use CRC32 because the hardware support appears
quite good.
> For cases where a 4-byte checksum is not enough I think we can look at either xxhash64
or MurmurHash3.
> The problem with xxhash64 is that output is only 8-bytes. The problem with MurmurHash3
is that the Java implementation is slow. If we can live with 8-bytes and make it easy to switch
hash implementations I think xxhash64 is a good choice because we already ship a good implementation
with LZ4.
> I would also like to see hashes always prefixed by a type so that we can swap hashes
without running into pain trying to figure out what hash implementation is present. I would
also like to avoid making assumptions about the number of bytes in a hash field where possible
keeping in mind compatibility and space issues.
> Hashing after compression is also desirable over hashing before compression.

This message was sent by Atlassian JIRA

View raw message