Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E005D63D2 for ; Tue, 12 Jul 2011 20:05:23 +0000 (UTC) Received: (qmail 13122 invoked by uid 500); 12 Jul 2011 20:05:23 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 12894 invoked by uid 500); 12 Jul 2011 20:05:23 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 12880 invoked by uid 99); 12 Jul 2011 20:05:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Jul 2011 20:05:22 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Jul 2011 20:05:21 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 03B944C4F1 for ; Tue, 12 Jul 2011 20:05:01 +0000 (UTC) Date: Tue, 12 Jul 2011 20:05:01 +0000 (UTC) From: "Sylvain Lebresne (JIRA)" To: commits@cassandra.apache.org Message-ID: <36493041.7277.1310501101011.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (CASSANDRA-47) SSTable compression MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-47?page=3Dcom.atlassi= an.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D13= 064101#comment-13064101 ]=20 Sylvain Lebresne commented on CASSANDRA-47: ------------------------------------------- bq. As I wrote before - currently to check real size of the file (tested on= ly on linux because OS X FS saves empty blocks to the disk for some reason)= you need to get a block count using 'ls -alhs', current patch reserves an = empty space for each chunk because we need to do seeks while we write data = using SSTableWriter. Yeah, I really think we shouldn't do that (i.e, have empty space between th= e compressed chunks). I'm happy to learn that linux (or at least whatever f= ile system you are using, I haven't tried the patch on linux yet) is smart = enough to avoid allocating empty blocks but we shouldn't rely on this. I be= t not all file system do that (osx seems to prove that and I'm not sure all= linux FS does this) and anyway if you transfer the sstables or tar them or= anything, it'll still be more inefficient than necessary (because the file= still *is* of the size of the uncompressed data). We're also losing some s= pace even on linux depending on what the actual FS block size is (not a big= deal, but this can add up). So I think we really need to change the index = (and key cache) to store the offset in compressed data. Imho, the simplest = way would be to instead of having in the index the key followed by the offs= et, to have for compressed file, the key, then the position of the chunk in= the compressed file, then the offset in the uncompressed chunk. Another thing is that we will need that to be optional (if only because we = cannot expect people to trust this from day one). Don't get me wrong, it's = nice to have a first prototype to have an idea of what we're talking about,= but I just wanted to mention this because it's probably easier to take tha= t into account sooner than later (I also suspect we may be able to factor o= ut some of the code of BRAF and CDF, but I haven't look too closely so mayb= e not). =20 > SSTable compression > ------------------- > > Key: CASSANDRA-47 > URL: https://issues.apache.org/jira/browse/CASSANDRA-47 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Jonathan Ellis > Assignee: Pavel Yaskevich > Labels: compression > Fix For: 1.0 > > Attachments: CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar > > > We should be able to do SSTable compression which would trade CPU for I/O= (almost always a good trade). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira