cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6809) Compressed Commit Log
Date Wed, 20 Aug 2014 05:40:18 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103412#comment-14103412
] 

Benedict commented on CASSANDRA-6809:
-------------------------------------

bq. Does this statement apply to this ticket or CASSANDRA-7075? Seems more apropos to the
later.

CASSANDRA-7075 has no reason to gate the concurrency by anything other than available disks,
so it would be a bit out of context there, however without CASSANDRA-7075 this obviously isn't
an issue here. Maxing out one CPU with compression would be very easy, however, so I suggest
this ticket now depends on CASSANDRA-7075.

bq. I think the pre-allocation is good (and a cheap operation, at least by LOC)

I disagree. If we're dropping recycling, we may as well allocate on the sync thread(s). No
point having an extra thread and moving part when the vast majority of time is spent writing
to disk. We're talking a few hundred micros extra cost, across an entire sync, which only
ever stops a sync from progressing if the disk isn't keeping up, at which point we're done
for anyway. With a 10s sync periodic, a cost of even 1ms is 0.01% of total disk bandwidth
lost, and 1ms is a greatly exaggerated cost for creating a file. We also only pay this each
time we rollover a segment, so by default at most once every 32Mb, 1ms every 32Mb is a 32Gb/s
throughput rate, which is never going to come close to bottlenecking anything.

bq. Let's face it, if you're hitting the CL hard, you're also hitting the memtables, et al,
pretty hard, as well, due a consistently large number of writes coming in. As long as the
CL doesn't fall behind/take up too many resources/does it's job and doesn't complain, I think
that's the sweet spot. I'd rather have the resources optimized for the client-facing path.

Not necessarily. You could be doing in place updates, or you could have very large memtables
(esp. with offheap memtables now) which could easily outpace CL without overwrites, without
even needing to flush memtables.

> Compressed Commit Log
> ---------------------
>
>                 Key: CASSANDRA-6809
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6809
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>            Assignee: Branimir Lambov
>            Priority: Minor
>              Labels: performance
>             Fix For: 3.0
>
>
> It seems an unnecessary oversight that we don't compress the commit log. Doing so should
improve throughput, but some care will need to be taken to ensure we use as much of a segment
as possible. I propose decoupling the writing of the records from the segments. Basically
write into a (queue of) DirectByteBuffer, and have the sync thread compress, say, ~64K chunks
every X MB written to the CL (where X is ordinarily CLS size), and then pack as many of the
compressed chunks into a CLS as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message