Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DFD5F9A38 for ; Thu, 22 Mar 2012 08:12:50 +0000 (UTC) Received: (qmail 91088 invoked by uid 500); 22 Mar 2012 08:12:50 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 91047 invoked by uid 500); 22 Mar 2012 08:12:50 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 91010 invoked by uid 99); 22 Mar 2012 08:12:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Mar 2012 08:12:50 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Mar 2012 08:12:47 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id B74EC2B2095 for ; Thu, 22 Mar 2012 08:12:26 +0000 (UTC) Date: Thu, 22 Mar 2012 08:12:26 +0000 (UTC) From: "Peter Schuller (Commented) (JIRA)" To: commits@cassandra.apache.org Message-ID: <1743302931.1538.1332403946752.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1418890383.3126.1329891528709.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (CASSANDRA-3943) Too many small size sstables after loading data using sstableloader or BulkOutputFormat increases compaction time. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/CASSANDRA-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235440#comment-13235440 ] Peter Schuller commented on CASSANDRA-3943: ------------------------------------------- It also facilitates replacing a data set one sstable at a time (if one generates sstables that correspond exactly in ranges), allowing completely replacement of a dataset without a temporary disk space spike. Without any of these fixes, extra disk space needed is very significant - both regular compaction overhead in addition to loading two data sets onto the node. > Too many small size sstables after loading data using sstableloader or BulkOutputFormat increases compaction time. > ------------------------------------------------------------------------------------------------------------------ > > Key: CASSANDRA-3943 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3943 > Project: Cassandra > Issue Type: Wish > Components: Hadoop, Tools > Affects Versions: 0.8.2, 1.1.0 > Reporter: Samarth Gahire > Priority: Minor > Labels: bulkloader, hadoop, ponies, sstableloader, streaming, tools > Original Estimate: 168h > Remaining Estimate: 168h > > When we create sstables using SimpleUnsortedWriter or BulkOutputFormat,the size of sstables created is around the buffer size provided. > But After loading , sstables created in the cluster nodes are of size around > {code}( (sstable_size_before_loading) * replication_factor ) / No_Of_Nodes_In_Cluster{code} > As the no of nodes in cluster goes increasing, size of each sstable loaded to cassandra node decreases.Such small size sstables take too much time to compact (minor compaction) as compare to relatively large size sstables. > One solution that we have tried is to increase the buffer size while generating sstables.But as we increase the buffer size ,time taken to generate sstables increases.Is there any solution to this in existing versions or are you fixing this in future version? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira