Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6D1B59F99 for ; Tue, 1 May 2012 19:06:57 +0000 (UTC) Received: (qmail 17775 invoked by uid 500); 1 May 2012 19:06:55 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 17753 invoked by uid 500); 1 May 2012 19:06:55 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 17743 invoked by uid 99); 1 May 2012 19:06:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 May 2012 19:06:55 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of edlinuxguru@gmail.com designates 209.85.213.172 as permitted sender) Received: from [209.85.213.172] (HELO mail-yx0-f172.google.com) (209.85.213.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 May 2012 19:06:49 +0000 Received: by yenq7 with SMTP id q7so95496yen.31 for ; Tue, 01 May 2012 12:06:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=6WluhFrIKUnu7fCDh3wao449fcrVXN8CthqgeXquf88=; b=W5cEbqK1k0ri4+sc9RA9ImHTLoLrIJkDoNmb/BiNrkdZot9B3G3/PXVxOtdWj8UEAL cE3j+zXn9A2nt5WTSmpD1crr3Kr6cmHaACs9WFwGUrIc+foQfe4B3Zc18Y8K7Di+Slo5 ncf6GGDB2fHdoWCUGpCjxc1OvkK1J1K4L1OBLTIHeoWigN8lISnJK7SoQyV6TXZhSCNM ixljjGy6weXydni0D5h05CuTuQxk37Uxgt+vtE2PsfwwLCV4rJ3cjOd5FO8xY+FlO0sC QI9t9MuyKTv0LdQ/s9YR5qCFWb81ip811X//kADJ4C1Ods5FanGV9ISsM11AM5eBrXTb Lgeg== MIME-Version: 1.0 Received: by 10.50.17.201 with SMTP id q9mr2934983igd.19.1335899187940; Tue, 01 May 2012 12:06:27 -0700 (PDT) Received: by 10.42.83.16 with HTTP; Tue, 1 May 2012 12:06:27 -0700 (PDT) In-Reply-To: References: <4F98FE29.70007@sitevision.se> <4F99173E.9080801@sitevision.se> <7DDDB474-E321-43AB-8BBC-C0EF0C34A916@thelastpickle.com> Date: Tue, 1 May 2012 15:06:27 -0400 Message-ID: Subject: Re: Question regarding major compaction. From: Edward Capriolo To: user@cassandra.apache.org Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Also there are some tickets in JIRA to impose a max sstable size and some other related optimizations that I think got stuck behind levelDB in coolness factor. Not every use case is good for leveled so adding more tools and optimizations of the Size Tiered tables would be awesome. On Tue, May 1, 2012 at 10:15 AM, Jason Rutherglen wrote: > I wonder if TieredMergePolicy [1] could be used in Cassandra for compacti= on? > > 1. http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-mer= ges.html > > On Tue, May 1, 2012 at 6:38 AM, Edward Capriolo w= rote: >> Henrik, >> >> There are use cases where major compaction works well like yours and >> mine. Essentially cases with a high amount of churn, updates and >> deletes we get a lot of benefit from forced tombstone removal in the >> form of less physical data. >> >> However we end up with really big sstables that naturally will never >> get compacted away since they are so much bigger then the other >> tables. So we get stuck always major compacting forever. >> >> Cassandra needs un compact for people like us so we can turn 1 big >> sstable into multiple smaller ones. Or a major compaction that takes >> in multiple sstables and produces multiple output tables nicely >> organized for bloom filter hits and tombstone free. >> >> Edward >> >> On Tue, May 1, 2012 at 7:31 AM, Henrik Schr=F6der wr= ote: >>> But what's the difference between doing an extra read from that One Big >>> File, than doing an extra read from whatever SSTable happen to be large= st in >>> the course of automatic minor compaction? >>> >>> We have a pretty update-heavy application, and doing a major compaction= can >>> remove up to 30% of the used diskspace. That directly translates into l= ess >>> reads and less SSTables that rows appear in. Everything that's unchange= d >>> since the last major compaction is obviously faster to access, and >>> everything that's changed since the last major compaction is about the = same >>> as if we hadn't done it? >>> >>> So I'm still confused. I don't see a significant difference between doi= ng >>> the occasional major compaction or leaving it to do automatic minor >>> compactions. What am I missing? Reads will "continually degrade" with >>> automatic minor compactions as well, won't they? >>> >>> I can sort of see that if you have a moving active data set, then that = will >>> most probably only exist in the smallest SSTables and frequently be the >>> object of minor compactions, and doing a major compaction will move all= of >>> it into the biggest SSTables? >>> >>> >>> /Henrik >>> >>> On Mon, Apr 30, 2012 at 05:35, aaron morton w= rote: >>>> >>>> Depends on your definition of significantly, there are a few things to >>>> consider. >>>> >>>> * Reading from SSTables for a request is a serial operation. Reading f= rom >>>> 2 SSTables will take twice as long as 1. >>>> >>>> * If the data in the One Big File=99 has been overwritten, reading it = is a >>>> waste of time. And it will continue to be read until it the row is com= pacted >>>> away. >>>> >>>> * You will need to get min_compaction_threshold (CF setting) SSTables = that >>>> big before automatic compaction will pickup the big file. >>>> >>>> On the other side: Some people do report getting value from nightly ma= jor >>>> compactions. They also manage their cluster to reduce the impact of >>>> performing the compactions. >>>> >>>> Hope that helps. >>>> >>>> ----------------- >>>> Aaron Morton >>>> Freelance Developer >>>> @aaronmorton >>>> http://www.thelastpickle.com >>>> >>>> On 26/04/2012, at 9:37 PM, Fredrik wrote: >>>> >>>> Exactly, but why would reads be significantly slower over time when >>>> including just one more, although sometimes large, SSTable in the read= ? >>>> >>>> Ji Cheng skrev 2012-04-26 11:11: >>>> >>>> I'm also quite interested in this question. Here's my understanding on >>>> this problem. >>>> >>>> 1. If your workload is append-only, doing a major compaction shouldn't >>>> affect the read performance too much, because each row appears in one >>>> sstable anyway. >>>> >>>> 2. If your workload is mostly updating existing rows, then more and mo= re >>>> columns will be obsoleted in that big sstable created by major compact= ion. >>>> And that super big sstable won't be compacted until you either have an= other >>>> 3 similar-sized sstables or start another major compaction. But I am n= ot >>>> very sure whether this will be a major problem, because you only end u= p with >>>> reading one more sstable. Using size-tiered compaction against mostly-= update >>>> workload itself may result in reading multiple sstables for a single r= ow >>>> key. >>>> >>>> Please correct me if I am wrong. >>>> >>>> Cheng >>>> >>>> >>>> On Thu, Apr 26, 2012 at 3:50 PM, Fredrik >>>> wrote: >>>>> >>>>> In the tuning documentation regarding Cassandra, it's recomended not = to >>>>> run major compactions. >>>>> I understand what a major compaction is all about but I'd like an in >>>>> depth explanation as to why reads "will continually degrade until the= next >>>>> major compaction is manually invoked". >>>>> >>>>> From the doc: >>>>> "So while read performance will be good immediately following a major >>>>> compaction, it will continually degrade until the next major compacti= on is >>>>> manually invoked. For this reason, major compaction is NOT recommende= d by >>>>> DataStax." >>>>> >>>>> Regards >>>>> /Fredrik >>>> >>>> >>>> >>>> >>>