Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CBB986D85 for ; Wed, 27 Jul 2011 04:23:39 +0000 (UTC) Received: (qmail 25199 invoked by uid 500); 27 Jul 2011 04:23:39 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 25018 invoked by uid 500); 27 Jul 2011 04:23:36 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 24933 invoked by uid 99); 27 Jul 2011 04:23:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Jul 2011 04:23:34 +0000 X-ASF-Spam-Status: No, hits=-2001.2 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Jul 2011 04:23:32 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id BAF3C8B381 for ; Wed, 27 Jul 2011 04:23:10 +0000 (UTC) Date: Wed, 27 Jul 2011 04:23:10 +0000 (UTC) From: "Terje Marthinussen (JIRA)" To: commits@cassandra.apache.org Message-ID: <300432422.10657.1311740590761.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (CASSANDRA-47) SSTable compression MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071504#comment-13071504 ] Terje Marthinussen commented on CASSANDRA-47: --------------------------------------------- Instead of on/off we could use size. In the cassandra we run, we have compression implemented on a supercolumn level. It turned out to be very good for performance for us not to compress data in memtables (which we would normally do with compression on supercolumns) or during flushing from memtables as both of these caused slowdown in the write path. Under heavy write activity, the resulting sstables from memtable flushes often gets pretty small (maybe avg. 20MB in our case) so compression does not really make much difference on disk consumption there, but the performance penalty does. All the compression/decompression on compacting the smallest tables also makes a noticable difference when trying to keep up on the compaction side. Instead we went for compression which only happens when a source sstable during compaction is larger than 4GB. I would recommend to consider similar functionality here. I started off with ning for our compression, but I now run the built in java deflate to get even better compression. Since we only compress the largest sstables, and do no other compression in the write path or on compaction of small sstables,the very slow compression of deflate does not bother us that much. The read side is of course still slower with inflate, but it is still more than fast enough to not be a problem. OS caching will also be better thanks to the better compression so we can regain some of the performance lost vs. ning/snappy there. We could also consider being very tunable with deflate for very large sstables, ning/snappy for smaller and no compression for the smallest, but I am not sure it is worth it. By the way, how much difference did you see on ning vs. snappy? When I tested it was not all that much difference and I felt ning was easier to bundle so to me it seemed like a better alternative. > SSTable compression > ------------------- > > Key: CASSANDRA-47 > URL: https://issues.apache.org/jira/browse/CASSANDRA-47 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Jonathan Ellis > Assignee: Pavel Yaskevich > Labels: compression > Fix For: 1.0 > > Attachments: CASSANDRA-47-v2.patch, CASSANDRA-47-v3-rebased.patch, CASSANDRA-47-v3.patch, CASSANDRA-47-v4.patch, CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar > > > We should be able to do SSTable compression which would trade CPU for I/O (almost always a good trade). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira