cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Branimir Lambov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6809) Compressed Commit Log
Date Wed, 14 Jan 2015 20:44:36 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277638#comment-14277638
] 

Branimir Lambov commented on CASSANDRA-6809:
--------------------------------------------

Thanks for the review, Ariel. The plan was to implement compression without introducing too
much extra complexity thus I've stayed away from adding new queues and stages dedicated to
compression. Even so, a pipeline that achieves the highest possible throughput is quite doable
by simply using more than one sync thread, and the size of the compression window is easily
controlled via the sync period. Since we don't have an incremental compression mechanism,
compression necessarily has to happen at the end, i.e. when the whole of the set of mutations
to compress has been written to the buffer (this is usually not at the end of the segment).

CASSANDRA-7075 is valuable on its own right. Arguably RAID 0 is not good enough in either
performance or reliability. None of the extra complexity we introduce there is made necessary
by compression-related concerns, but one of the side effects of it is the availability of
more than one sync thread for compression. It is a solution of sorts to the lack of CPU saturation
from this patch, but it is not at all the only way to achieve it.

This code was written before ByteBuffer compression was made available; I will definitely
make use of that now, but I wonder if  that should not be a separate patch so that we don't
have to block on/conflict with Jake's patch.

The playback tests are in the various RecoveryManagerTests in o.a.c.db; the tests are the
same for the uncompressed (test/testold target) and compressed case (test-compressed target).
For performance tests the ultimate measure is cassandra-stress; ComitLogStress is a simple
microbenchmark of how much we can push that favors compression -- make sure to run it with
periodic rather than batch sync. A latency test is probably needed for batch mode; this probably
needs some changes to the service to make sure the sync period can go low enough for the write
latency to show up. 

I have not looked at the other suggestions yet; I have to switch modes from the quite different
node allocation work-- give me a couple of days.

> Compressed Commit Log
> ---------------------
>
>                 Key: CASSANDRA-6809
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6809
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>            Assignee: Branimir Lambov
>            Priority: Minor
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: logtest.txt
>
>
> It seems an unnecessary oversight that we don't compress the commit log. Doing so should
improve throughput, but some care will need to be taken to ensure we use as much of a segment
as possible. I propose decoupling the writing of the records from the segments. Basically
write into a (queue of) DirectByteBuffer, and have the sync thread compress, say, ~64K chunks
every X MB written to the CL (where X is ordinarily CLS size), and then pack as many of the
compressed chunks into a CLS as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message