Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 71582960C for ; Tue, 1 May 2012 16:08:17 +0000 (UTC) Received: (qmail 80857 invoked by uid 500); 1 May 2012 16:08:17 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 80818 invoked by uid 500); 1 May 2012 16:08:17 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 80781 invoked by uid 99); 1 May 2012 16:08:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 May 2012 16:08:17 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.160.44] (HELO mail-pb0-f44.google.com) (209.85.160.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 May 2012 16:08:11 +0000 Received: by pbcwy7 with SMTP id wy7so1832774pbc.31 for ; Tue, 01 May 2012 09:07:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding:x-gm-message-state; bh=5P0jCnxvrrO1CKvP/FVXiPB57ETb4W8/HIM1K6p+WNc=; b=cLTjFq4rfn4xuMInyV1wMMm4R7gxlM7WEuLGK06PIy/9kKvAkgYLz33GynQEGJZfDj N0Cw6oatsG9wu1qEKhRm1vAXlJ3QmuS8Fcqed+xlRIrSuHpdiU7t6eKv+r9UeiN0ZZ6K mbMy2dxCH5wKFY68No9NfwBG90am32tJnguC3uIJkkPXzNS8AC+CG/rBQk5NjXL4pDZ2 VAdjr01DAUr//eSqcK2ufkA73d4ThximNWyUExzWHurV+kzjUNVq3m8GJ0JyC7J6ydsW 7dCK+6xHUtXtF+jcysh88/LeOx5naZLIEHsVxmH/uzATX8tvlFn+58hXLgA3QhpyPb9u X+YA== MIME-Version: 1.0 Received: by 10.68.232.163 with SMTP id tp3mr24358699pbc.70.1335888471566; Tue, 01 May 2012 09:07:51 -0700 (PDT) Received: by 10.68.36.164 with HTTP; Tue, 1 May 2012 09:07:51 -0700 (PDT) In-Reply-To: References: <4F98FE29.70007@sitevision.se> <4F99173E.9080801@sitevision.se> <7DDDB474-E321-43AB-8BBC-C0EF0C34A916@thelastpickle.com> Date: Tue, 1 May 2012 09:07:51 -0700 Message-ID: Subject: Re: Question regarding major compaction. From: Rob Coli To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQneBjCtFYEW/NM9H7sBN9Teifwi5+Ohfg3aB0smd+GAWEJEhuo3wOyQTHT47/JiZd/reO4l X-Virus-Checked: Checked by ClamAV on apache.org On Tue, May 1, 2012 at 4:31 AM, Henrik Schr=F6der wrote= : > But what's the difference between doing an extra read from that One Big > File, than doing an extra read from whatever SSTable happen to be largest= in > the course of automatic minor compaction? The primary differences, as I understand it, are that the index performance and bloom filter false positive rate for your One Big File are worse. First, you are more likely to get a bloom filter false positive due to the intrinsic degradation of bloom filter performance as number of keys increases. Next, after traversing the SStable index to get to the closest indexed key, you will be forced to scan past more keys which are not your key in order to get to the key which is your key. > So I'm still confused. I don't see a significant difference between doing > the occasional major compaction or leaving it to do automatic minor > compactions. What am I missing? Reads will "continually degrade" with > automatic minor compactions as well, won't they? I still don't really understand what precisely "continually degrade" means here either, FWIW, or the two operating paradigms being compared under what sort of workloads. As a simple example, I don't believe performance will "continually" do anything if your workload does not issue logical UPDATE or DELETE to rows. The documentation statement seems confusingly-vaguely-yet-strongly phrased, even if true. =3DRob --=20 =3DRobert Coli AIM>ALK - rcoli@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb