Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F17829C06 for ; Tue, 1 May 2012 14:16:21 +0000 (UTC) Received: (qmail 39304 invoked by uid 500); 1 May 2012 14:16:19 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 39270 invoked by uid 500); 1 May 2012 14:16:19 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 39262 invoked by uid 99); 1 May 2012 14:16:19 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 May 2012 14:16:19 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jason.rutherglen@gmail.com designates 209.85.160.44 as permitted sender) Received: from [209.85.160.44] (HELO mail-pb0-f44.google.com) (209.85.160.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 May 2012 14:16:13 +0000 Received: by pbcwy7 with SMTP id wy7so1713720pbc.31 for ; Tue, 01 May 2012 07:15:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=AHhdT7q1LekUswmBfE/Dh9mELMP9H3Gubi5KhUWDGk0=; b=sg2r/nOa0tJhzZuyaZQ1xNjtBK7VB0du4M89YnjJ8nYylqYAxgDtIFUOSN5BK6mtU8 ZGdPsZ22RTiKPgwZzQbnRJzez08+D32vkSgWRqvQjNXwPauujBpzet8gskaFrtAa70Wu Y8mNsftMOC4905+h9WMs227DeuzOP5robtUXdAldntxaebgUTA/IK23KMGPk2V2zioG3 5LAWXJVoqftNC74Z+YxeA0Iu6fYVuxakEq91JWW8SXsPYeTzzfVEdUjODsF0dkYbfmgq vdlEsc9JtgtLCygwQjzvRJumysSvGD9iLl8OCiqAhUz/FY0S3sNUrGFShPMGPJqrKnpJ u4WQ== MIME-Version: 1.0 Received: by 10.68.195.233 with SMTP id ih9mr14307846pbc.128.1335881752073; Tue, 01 May 2012 07:15:52 -0700 (PDT) Received: by 10.68.24.37 with HTTP; Tue, 1 May 2012 07:15:52 -0700 (PDT) In-Reply-To: References: <4F98FE29.70007@sitevision.se> <4F99173E.9080801@sitevision.se> <7DDDB474-E321-43AB-8BBC-C0EF0C34A916@thelastpickle.com> Date: Tue, 1 May 2012 07:15:52 -0700 Message-ID: Subject: Re: Question regarding major compaction. From: Jason Rutherglen To: user@cassandra.apache.org Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable I wonder if TieredMergePolicy [1] could be used in Cassandra for compaction= ? 1. http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merge= s.html On Tue, May 1, 2012 at 6:38 AM, Edward Capriolo wro= te: > Henrik, > > There are use cases where major compaction works well like yours and > mine. Essentially cases with a high amount of churn, updates and > deletes we get a lot of benefit from forced tombstone removal in the > form of less physical data. > > However we end up with really big sstables that naturally will never > get compacted away since they are so much bigger then the other > tables. So we get stuck always major compacting forever. > > Cassandra needs un compact for people like us so we can turn 1 big > sstable into multiple smaller ones. Or a major compaction that takes > in multiple sstables and produces multiple output tables nicely > organized for bloom filter hits and tombstone free. > > Edward > > On Tue, May 1, 2012 at 7:31 AM, Henrik Schr=F6der wro= te: >> But what's the difference between doing an extra read from that One Big >> File, than doing an extra read from whatever SSTable happen to be larges= t in >> the course of automatic minor compaction? >> >> We have a pretty update-heavy application, and doing a major compaction = can >> remove up to 30% of the used diskspace. That directly translates into le= ss >> reads and less SSTables that rows appear in. Everything that's unchanged >> since the last major compaction is obviously faster to access, and >> everything that's changed since the last major compaction is about the s= ame >> as if we hadn't done it? >> >> So I'm still confused. I don't see a significant difference between doin= g >> the occasional major compaction or leaving it to do automatic minor >> compactions. What am I missing? Reads will "continually degrade" with >> automatic minor compactions as well, won't they? >> >> I can sort of see that if you have a moving active data set, then that w= ill >> most probably only exist in the smallest SSTables and frequently be the >> object of minor compactions, and doing a major compaction will move all = of >> it into the biggest SSTables? >> >> >> /Henrik >> >> On Mon, Apr 30, 2012 at 05:35, aaron morton wr= ote: >>> >>> Depends on your definition of significantly, there are a few things to >>> consider. >>> >>> * Reading from SSTables for a request is a serial operation. Reading fr= om >>> 2 SSTables will take twice as long as 1. >>> >>> * If the data in the One Big File=99 has been overwritten, reading it i= s a >>> waste of time. And it will continue to be read until it the row is comp= acted >>> away. >>> >>> * You will need to get min_compaction_threshold (CF setting) SSTables t= hat >>> big before automatic compaction will pickup the big file. >>> >>> On the other side: Some people do report getting value from nightly maj= or >>> compactions. They also manage their cluster to reduce the impact of >>> performing the compactions. >>> >>> Hope that helps. >>> >>> ----------------- >>> Aaron Morton >>> Freelance Developer >>> @aaronmorton >>> http://www.thelastpickle.com >>> >>> On 26/04/2012, at 9:37 PM, Fredrik wrote: >>> >>> Exactly, but why would reads be significantly slower over time when >>> including just one more, although sometimes large, SSTable in the read? >>> >>> Ji Cheng skrev 2012-04-26 11:11: >>> >>> I'm also quite interested in this question. Here's my understanding on >>> this problem. >>> >>> 1. If your workload is append-only, doing a major compaction shouldn't >>> affect the read performance too much, because each row appears in one >>> sstable anyway. >>> >>> 2. If your workload is mostly updating existing rows, then more and mor= e >>> columns will be obsoleted in that big sstable created by major compacti= on. >>> And that super big sstable won't be compacted until you either have ano= ther >>> 3 similar-sized sstables or start another major compaction. But I am no= t >>> very sure whether this will be a major problem, because you only end up= with >>> reading one more sstable. Using size-tiered compaction against mostly-u= pdate >>> workload itself may result in reading multiple sstables for a single ro= w >>> key. >>> >>> Please correct me if I am wrong. >>> >>> Cheng >>> >>> >>> On Thu, Apr 26, 2012 at 3:50 PM, Fredrik >>> wrote: >>>> >>>> In the tuning documentation regarding Cassandra, it's recomended not t= o >>>> run major compactions. >>>> I understand what a major compaction is all about but I'd like an in >>>> depth explanation as to why reads "will continually degrade until the = next >>>> major compaction is manually invoked". >>>> >>>> From the doc: >>>> "So while read performance will be good immediately following a major >>>> compaction, it will continually degrade until the next major compactio= n is >>>> manually invoked. For this reason, major compaction is NOT recommended= by >>>> DataStax." >>>> >>>> Regards >>>> /Fredrik >>> >>> >>> >>> >>