cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6809) Compressed Commit Log
Date Tue, 13 Jan 2015 19:12:35 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275726#comment-14275726
] 

Ariel Weisberg commented on CASSANDRA-6809:
-------------------------------------------

I finished my review. Comments are in the pull request. It looks good and could ship as is.
I have some thoughts about potential scope creep I would advocate for. Also some other directions
for enhancement the commit log could go in as well as some reservations about performance
in some cases. I only just noticed CommitLog stress so I need to check that out so I can understand
the numbers and what is being tested.

RE CASSANDRA-7075  multiple CL disks. I see this as a work around for not having RAID-0 of
the volumes being used for the CL and that is it. And that may introduce it's own load balancing
issues as well as a mess of code for scattering/gathering mutations that I am less comfortable
with. Writing a CL pipeline that can do the maximum supported sequential IO to a single file
is doable, and if I had a choice it is what I would rather write. From a user perspective
it is a nice feature to not to be forced to provide a RAID volume and to me that should be
the primary motivation.

Also fascinating (to me) piece of trivia. When I tested in the past I could call force() on
a mapped byte buffer far fewer times then I could call force() on a FileChannel. So if I had
a battery backed disk controller and I appended a page (in a preallocated file) and called
force() in a loop with a MappedByteBuffer it would do a few hundreds syncs a second, but with
FileChannel.force it would do a few thousand. MBB was slow enough to be a concern for synchronous
commits.


> Compressed Commit Log
> ---------------------
>
>                 Key: CASSANDRA-6809
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6809
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>            Assignee: Branimir Lambov
>            Priority: Minor
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: logtest.txt
>
>
> It seems an unnecessary oversight that we don't compress the commit log. Doing so should
improve throughput, but some care will need to be taken to ensure we use as much of a segment
as possible. I propose decoupling the writing of the records from the segments. Basically
write into a (queue of) DirectByteBuffer, and have the sync thread compress, say, ~64K chunks
every X MB written to the CL (where X is ordinarily CLS size), and then pack as many of the
compressed chunks into a CLS as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message