cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Lerer (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-10225) Make compression ratio much more accurate
Date Thu, 26 Nov 2015 13:33:11 GMT


Benjamin Lerer commented on CASSANDRA-10225:

Sorry for the delay.

{quote}Computing the compression ratio by making the sum of the compressedFileLength and dividing
it by the sum of the dataLength does not look a bad approach to me but it seems that the data
length might not always be the real length (according to a comment in CompressionMetadata).{quote}

In case of early opening the data length can effectively be shorter than the real length but
as the SSTable are retrieved with {{SSTableSet.CANONICAL}} the early opened SSTables are not
returned. By consequence the data length will always be the real one.

While reviewing this problem I also discovered that the compression ratio returned by some
SSTableReader could be wrong (CASSANDRA-10775) as such using {{sstable.getCompressionRatio()
!= MetadataCollector.NO_COMPRESSION_RATIO}} instead of using {{SSTable.compression}} was leading
to wrong results even with the new approach.

As the fix change the behavior of the metrics, it is probably safer to makes that change in
{{3.2}} only.

> Make compression ratio much more accurate
> -----------------------------------------
>                 Key: CASSANDRA-10225
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Jeremy Hanna
>            Assignee: Brett Snyder
>              Labels: lhf
>             Fix For: 2.1.x
>         Attachments: cassandra-2.1-10225.txt
> Currently in cfstats, it will take an average over the compression ratios of all of the
sstables without regard to the data sizes.  This can lead to a very inaccurate value.  It
would be good to factor in the uncompressed and compressed sizes for the sstables to give
an accurate number.

This message was sent by Atlassian JIRA

View raw message