cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Branimir Lambov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10520) Compressed writer and reader should support non-compressed data.
Date Thu, 16 Feb 2017 13:50:41 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869945#comment-15869945
] 

Branimir Lambov commented on CASSANDRA-10520:
---------------------------------------------

Rebased and updated the patch and triggered another round of testing.

bq. The micro benchmark looks different on my Linux machine

That's very similar to what I get with page cache enabled. Is it possible you have run the
benchmark without turning it off?

bq. When writing compressed chunks, the compressed buffer is sized to the max compression
length. WDYT about just passing a buffer that's bounded to maxCompressedLength and handle
the buffer-overflow-exception to write it uncompressed?

This is a possibility but as the use of exceptions on non-exceptional code paths is a bit
of a frowned-upon practice I am worried that it can cause optimization headaches -- JIT refusing
to optimize or doing the wrong thing, resulting in compression always taking longer than it
should. At this point I don't really want to risk something like that, but it's an option
to explore if we get some free cycles later on to verify that there are no performance issues
in all relevant configurations.

bq. Just for clarification - is the following correct?

Yes, that is correct. {{<=}}/compressed is the typical path, hence placed first on the
read side, and on the write path we have an {{if}} that is only triggered on the alternative.
The latter could use a {{! <=}} pattern to make the subcondition identical, but that feels
unnatural and more complex than necessary.

bq. Even if CRC checks are disabled...

Suggested patch included, thanks.


> Compressed writer and reader should support non-compressed data.
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-10520
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10520
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths
>            Reporter: Branimir Lambov
>            Assignee: Branimir Lambov
>              Labels: messaging-service-bump-required
>             Fix For: 4.x
>
>         Attachments: ReadWriteTestCompression.java
>
>
> Compressing uncompressible data, as done, for instance, to write SSTables during stress-tests,
results in chunks larger than 64k which are a problem for the buffer pooling mechanisms employed
by the {{CompressedRandomAccessReader}}. This results in non-negligible performance issues
due to excessive memory allocation.
> To solve this problem and avoid decompression delays in the cases where it does not provide
benefits, I think we should allow compressed files to store uncompressed chunks as alternative
to compressed data. Such a chunk could be written after compression returns a buffer larger
than, for example, 90% of the input, and would not result in additional delays in writing.
On reads it could be recognized by size (using a single global threshold constant in the compression
metadata) and data could be directly transferred into the decompressed buffer, skipping the
decompression step and ensuring a 64k buffer for compressed data always suffices.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message