cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7928) Digest queries do not require alder32 checks
Date Mon, 15 Sep 2014 22:33:34 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134612#comment-14134612
] 

Benedict commented on CASSANDRA-7928:
-------------------------------------

Regrettably this is very plausible, and adds credence to CASSANDRA-7130, which we should consider
reopening. This ticket is also a sensible idea to help mitigate the issue. 

I knocked up a quick benchmark, results show lz4 being consistently at least twice as fast.
It's actually quite easy to explain: if the data is compressed, there is actually less data
to operate over; if it is not easily compressed (say, it is highly random), it degrades itself
to a simple copy to avoid wasting work (as demonstrated in the benchmark - it's 5 times faster
over completely random data than partially random data).

{noformat}
Benchmark              (duplicateLookback)  (pageSize)  (randomRatio)  (randomRunLength) 
(uniquePages)   Mode  Samples    Score  Score error   Units
Compression.adler32                 4..128       65536              0              4..16 
         8192  thrpt        5   16.476        1.954  ops/ms
Compression.adler32                 4..128       65536              0           128..512 
         8192  thrpt        5   16.720        0.230  ops/ms
Compression.adler32                 4..128       65536            0.1              4..16 
         8192  thrpt        5   16.269        2.118  ops/ms
Compression.adler32                 4..128       65536            0.1           128..512 
         8192  thrpt        5   16.665        0.246  ops/ms
Compression.adler32                 4..128       65536            1.0              4..16 
         8192  thrpt        5   16.653        0.147  ops/ms
Compression.adler32                 4..128       65536            1.0           128..512 
         8192  thrpt        5   16.686        0.214  ops/ms
Compression.lz4                     4..128       65536              0              4..16 
         8192  thrpt        5   28.275        0.265  ops/ms
Compression.lz4                     4..128       65536              0           128..512 
         8192  thrpt        5  232.602       48.279  ops/ms
Compression.lz4                     4..128       65536            0.1              4..16 
         8192  thrpt        5   34.081        0.337  ops/ms
Compression.lz4                     4..128       65536            0.1           128..512 
         8192  thrpt        5  130.857       18.157  ops/ms
Compression.lz4                     4..128       65536            1.0              4..16 
         8192  thrpt        5  187.992        9.190  ops/ms
Compression.lz4                     4..128       65536            1.0           128..512 
         8192  thrpt        5  186.054        2.267  ops/ms
{noformat}


> Digest queries do not require alder32 checks
> --------------------------------------------
>
>                 Key: CASSANDRA-7928
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7928
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: sankalp kohli
>            Priority: Minor
>
>  While reading data from sstables, C* does Alder32 checks for any data being read. We
have seen that this causes higher CPU usage while doing kernel profiling. These checks might
not be useful for digest queries as they will have a different digest in case of corruption.

>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message