Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 80CB27A12 for ; Thu, 17 Nov 2011 16:42:56 +0000 (UTC) Received: (qmail 11129 invoked by uid 500); 17 Nov 2011 16:42:54 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 11089 invoked by uid 500); 17 Nov 2011 16:42:54 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 11081 invoked by uid 99); 17 Nov 2011 16:42:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Nov 2011 16:42:54 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dan.hendry.junk@gmail.com designates 209.85.161.172 as permitted sender) Received: from [209.85.161.172] (HELO mail-gx0-f172.google.com) (209.85.161.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Nov 2011 16:42:47 +0000 Received: by ggnr5 with SMTP id r5so1554310ggn.31 for ; Thu, 17 Nov 2011 08:42:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=from:to:references:in-reply-to:subject:date:message-id:mime-version :content-type:content-transfer-encoding:x-mailer:thread-index :content-language; bh=FJt2evQd/YaCi5YeqW909ZczKhrBN7m0rJJtkEMahSc=; b=bADWADp0mDiRtLxyRv45P2WKvOeLRu6CCvVQdfmmhq3w33c+n4gey9HtaEZdDJphUH x3qxFLtU3vTkEfO8kr0i2EpT36Ao+taQvv2/nVATcntyjb7Wb8xJoCNwEkuBVhkIIi45 3GTDd2Vzeiyk40wp9MvkAI/ICke0yVESrOepA= Received: by 10.236.78.229 with SMTP id g65mr11197489yhe.4.1321548146605; Thu, 17 Nov 2011 08:42:26 -0800 (PST) Received: from DHTABLET ([216.16.242.198]) by mx.google.com with ESMTPS id z28sm6707711yhl.4.2011.11.17.08.42.24 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 17 Nov 2011 08:42:25 -0800 (PST) From: "Dan Hendry" To: References: <4EC4DB9D.4090000@sendmail.cz> In-Reply-To: <4EC4DB9D.4090000@sendmail.cz> Subject: RE: split large sstable Date: Thu, 17 Nov 2011 11:42:05 -0500 Message-ID: <4ec53971.a8afec0a.484d.ffff8d90@mx.google.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: AcylEBXCevegXKeRQvy52sejiv16VAANBSxg Content-Language: en-ca X-Virus-Checked: Checked by ClamAV on apache.org What do you mean by ' better file offset caching'? Presumably you mean 'better page cache hit rate'? Out of curiosity, why do you think this? What data are you seeing which makes you think it's better? I am certainly not even close to a virtual memory or page caching expert but I am pretty sure file size does not matter (assuming file sizes are significantly greater than the page size which I believe is 4k). Perhaps what you are actually seeing is row fragmentation across your SSTables? Easy to check with nodetool cfhistograms (SSTables column). To answer your question, I know of no tools to split SSTables. If you want to switch compaction strategies, levelled compaction (1.0.x) creates many smaller sstables instead of fewer, bigger ones. Although it is workload dependent, increasing min_compaction_threshold for size tiered compaction is probably a bad idea since it will increase row fragmentation across SSTables and therefore increase io/seeking requirements for reads (particularly for column ranges or non named-column queries). The only reason to do so is to reduce the frequency of compaction (disk io considerations). Dan -----Original Message----- From: Radim Kolar [mailto:hsn@sendmail.cz] Sent: November-17-11 5:02 To: user@cassandra.apache.org Subject: split large sstable Is there some simple way how to split large sstable into several smaller ones? I increased min_compaction_threshold (smaller tables seems to get better file offset caching from OS) and now i need to reshuffle data to smaller sstables, running several cluster wide repairs worked well just largest table was left. I have 80 GB sstable and need to split it to about 10 GB ones. No virus found in this incoming message. Checked by AVG - www.avg.com Version: 9.0.920 / Virus Database: 271.1.1/4020 - Release Date: 11/16/11 02:34:00