Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5BE02174D3 for ; Fri, 23 Jan 2015 13:31:36 +0000 (UTC) Received: (qmail 41235 invoked by uid 500); 23 Jan 2015 13:31:36 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 41192 invoked by uid 500); 23 Jan 2015 13:31:36 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 41180 invoked by uid 99); 23 Jan 2015 13:31:36 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Jan 2015 13:31:36 +0000 Date: Fri, 23 Jan 2015 13:31:36 +0000 (UTC) From: "Benedict (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-6809) Compressed Commit Log MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289242#comment-14289242 ] Benedict commented on CASSANDRA-6809: ------------------------------------- bq. Thank you, I did not realise you are interested in parallelism between segments only. Well, I considered that a natural extension, i.e. a follow up ticket. One I still consider reasonably straight forward to add: a mutator thread can partition the commit range once it's processed ~1Mb, and simply append the Callable to a shared queue. The sync thread can then drain this when it decides to initiate a sync. bq. I can see that this should work well enough with large sync periods, including the 10s default. I'm reasonably confident this will work as well or better for all sync periods. In particular it better guarantees honouring the sync periods, and is less likely to encourage random write behaviour. Of course, the main benefit is its simplicity. > Compressed Commit Log > --------------------- > > Key: CASSANDRA-6809 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6809 > Project: Cassandra > Issue Type: Improvement > Reporter: Benedict > Assignee: Branimir Lambov > Priority: Minor > Labels: performance > Fix For: 3.0 > > Attachments: ComitLogStress.java, logtest.txt > > > It seems an unnecessary oversight that we don't compress the commit log. Doing so should improve throughput, but some care will need to be taken to ensure we use as much of a segment as possible. I propose decoupling the writing of the records from the segments. Basically write into a (queue of) DirectByteBuffer, and have the sync thread compress, say, ~64K chunks every X MB written to the CL (where X is ordinarily CLS size), and then pack as many of the compressed chunks into a CLS as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)