cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dimitar Dimitrov (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-13703) Using min_compress_ratio <= 1 causes corruption
Date Tue, 05 Sep 2017 13:57:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16153675#comment-16153675
] 

Dimitar Dimitrov edited comment on CASSANDRA-13703 at 9/5/17 1:56 PM:
----------------------------------------------------------------------

+1 - from my still largely layman perspective, the change looks good.

A couple of small nits:
* {{CompressedChunkReader.maybeCheckCrc()}} (renamed in this patch to {{shouldCheckCrc()}})
may be removed or at least its visibility could be reduced. It seems to be used solely in
CompressedChunkReader.java, lines 158 and 204 in this patch.
* The no-argument / single-argument factory methods for Snappy and LZ4 compressions in CompressionParams.java
seem to differ in the values that they default to for min compression ratio and max compressed
length.
** For Snappy, if nothing is specified, or only chunk length is specified, a default min compression
ratio of 1.1 is used, and therefore max compressed length ends up somewhere roughly around
90% of chunk length.
** For LZ4, if nothing is specified, or only chunk length is specified, a default max compressed
length of chunk length is used, and therefore min compression ratio ends up at 1.0 (I'm not
sure if a precision error is possible there).

Edit: Of course, take this review with the appropriate rock-sized grain of salt.


was (Author: dimitarndimitrov):
+1 - from my still largely layman perspective, the change looks good.

A couple of small nits:
* {{CompressedChunkReader.maybeCheckCrc()}} (renamed in this patch to {{shouldCheckCrc()}})
may be removed or at least its visibility could be reduced. It seems to be used solely in
CompressedChunkReader.java, lines 158 and 204 in this patch.
* The no-argument / single-argument factory methods for Snappy and LZ4 compressions in CompressionParams.java
seem to differ in the values that they default to for min compression ratio and max compressed
length.
** For Snappy, if nothing is specified, or only chunk length is specified, a default min compression
ratio of 1.1 is used, and therefore max compressed length ends up somewhere roughly around
90% of chunk length.
** For LZ4, if nothing is specified, or only chunk length is specified, a default max compressed
length of chunk length is used, and therefore min compression ratio ends up at 1.0 (I'm not
sure if a precision error is possible there).

> Using min_compress_ratio <= 1 causes corruption
> -----------------------------------------------
>
>                 Key: CASSANDRA-13703
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13703
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Branimir Lambov
>            Assignee: Branimir Lambov
>            Priority: Blocker
>             Fix For: 4.x
>
>         Attachments: patch
>
>
> This is because chunks written uncompressed end up below the compressed size threshold.
Demonstrated by applying the attached patch meant to improve the testing of the 10520 changes,
and running {{CompressedSequentialWriterTest.testLZ4Writer}}.
> The default {{min_compress_ratio: 0}} is not affected as it never writes uncompressed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message