cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Brown (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-6809) Compressed Commit Log
Date Wed, 20 Aug 2014 08:38:29 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103622#comment-14103622
] 

Jason Brown edited comment on CASSANDRA-6809 at 8/20/14 8:37 AM:
-----------------------------------------------------------------

bq. If we're dropping recycling, ... bottlenecking anything.

Reread this paragraph several times, now it makes sense. I wasn't thinking about the write
perf, necessarily, but about having the file contiguous on disk. However, since the commit
log files are, more or less, one-time use (meaning, we're not doing tons of random nor sequential
I/O reads on them), I guess worrying about a large contiguous block on disk isn't necessary.

bq. Per-disk sync threads

I'm still not sure sync threads, in the manner initially described above, are totally necessary.
If you are worried about the time for the mmap'ed buffers to flush in the same thread that's
handling all the CL entry processing + any possible compression or encryption, a simple solution
might be to have a sync thread that merely invokes the mmap buffer flush. Thus, the main CL
thread(s) can continue processing the new entries and writing to the mmap buffer, but the
sync thread eats the cost of the msync.


was (Author: jasobrown):
bq. If we're dropping recycling, ... bottlenecking anything.

Reread this paragraph several times, now it makes sense. I wasn't thinking about the write
perf, necessarily, but about having the file contiguous on disk. However, since the commit
log files are, more or less, one-time use (meaning, we're not doing tons of random nor sequential
I/O reads on them), I guess worrying about a large contiguous block on disk isn't necessary.

bq. Per-disk sync threads

I'm still not sure sync threads are totally necessary. If you are worried about the time for
the mmap'ed buffers to flush in the same thread that's handling all the CL entry processing
+ any possible compression or encryption, a simple solution might be to have a sync thread
that merely invokes the mmap buffer flush. Thus, the main CL thread(s) can continue processing
the new entries and writing to the mmap buffer, but the sync thread eats the cost of the msync.

> Compressed Commit Log
> ---------------------
>
>                 Key: CASSANDRA-6809
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6809
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>            Assignee: Branimir Lambov
>            Priority: Minor
>              Labels: performance
>             Fix For: 3.0
>
>
> It seems an unnecessary oversight that we don't compress the commit log. Doing so should
improve throughput, but some care will need to be taken to ensure we use as much of a segment
as possible. I propose decoupling the writing of the records from the segments. Basically
write into a (queue of) DirectByteBuffer, and have the sync thread compress, say, ~64K chunks
every X MB written to the CL (where X is ordinarily CLS size), and then pack as many of the
compressed chunks into a CLS as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message