Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C3A3F17E46 for ; Fri, 13 Mar 2015 19:25:39 +0000 (UTC) Received: (qmail 7310 invoked by uid 500); 13 Mar 2015 19:25:39 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 7267 invoked by uid 500); 13 Mar 2015 19:25:39 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 7255 invoked by uid 99); 13 Mar 2015 19:25:39 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Mar 2015 19:25:39 +0000 Date: Fri, 13 Mar 2015 19:25:39 +0000 (UTC) From: "Ariel Weisberg (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-6809) Compressed Commit Log MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360969#comment-14360969 ] Ariel Weisberg commented on CASSANDRA-6809: ------------------------------------------- I don't think it works as a hard limit. Filesystems can hiccup for a long time and if you buffer to private memory you avoid seeing the hiccups. A high watermark isn't great either because you commit memory that isn't needed most of the time. Maybe I am not following what you are suggesting. When we have ponies we will be writing to private memory, probably around 128 megabytes, to avoid being at the mercy of the filesystem. Once compression is asynchronous to the filesystem and parallel the # of buffers can be small because compression will tear through fast enough to make the buffers available again. So you would have memory waiting to drain to the filesystem (128 megabytes) and a small number of buffers to aggregate log records until they are sent for compression. > Compressed Commit Log > --------------------- > > Key: CASSANDRA-6809 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6809 > Project: Cassandra > Issue Type: Improvement > Reporter: Benedict > Assignee: Branimir Lambov > Priority: Minor > Labels: docs-impacting, performance > Fix For: 3.0 > > Attachments: ComitLogStress.java, logtest.txt > > > It seems an unnecessary oversight that we don't compress the commit log. Doing so should improve throughput, but some care will need to be taken to ensure we use as much of a segment as possible. I propose decoupling the writing of the records from the segments. Basically write into a (queue of) DirectByteBuffer, and have the sync thread compress, say, ~64K chunks every X MB written to the CL (where X is ordinarily CLS size), and then pack as many of the compressed chunks into a CLS as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)