Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 31432 invoked from network); 26 Nov 2010 16:34:45 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 26 Nov 2010 16:34:45 -0000 Received: (qmail 88061 invoked by uid 500); 26 Nov 2010 16:34:43 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 88010 invoked by uid 500); 26 Nov 2010 16:34:43 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 88002 invoked by uid 99); 26 Nov 2010 16:34:43 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Nov 2010 16:34:43 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of edlinuxguru@gmail.com designates 209.85.161.44 as permitted sender) Received: from [209.85.161.44] (HELO mail-fx0-f44.google.com) (209.85.161.44) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Nov 2010 16:34:36 +0000 Received: by fxm9 with SMTP id 9so1797578fxm.31 for ; Fri, 26 Nov 2010 08:34:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=+tx8Td3xY7EFiZeDInAhPgJ5Xgsx9vIFgveXWYqX+ME=; b=DUAZ1PsPtWUPCOHwXrm9ogFTdU0k8Tz+70CZeolSRsfGOm6UhS1t3/7wAJfeSljOO3 8k6LI23pNDZm8hKNxjz6HhZ+eyxfjyil46BSa6tQWSfUJfUbk8QBxO7lx12LQpVjLVJj 562d2k73hKC8Sb/3poWCNLy0eE72i71AzO3vY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=tot1WW4IlDoycndXk5tLoXZtaPrxv/KTnCwYhwskCrLzNOk1KjR//3m1K7BGeEDhTN ejmLeix+Zu/AOiSWRzp5yX6P+3Mg7BKpl99coAYJKUcg9XAIM2D7vXai8BVG+v1esQXY U8D13PZzca6Ekv9pFaI0BQuRSEmqTqSOh9/yU= MIME-Version: 1.0 Received: by 10.223.118.211 with SMTP id w19mr46196faq.14.1290789256136; Fri, 26 Nov 2010 08:34:16 -0800 (PST) Received: by 10.223.21.21 with HTTP; Fri, 26 Nov 2010 08:34:16 -0800 (PST) In-Reply-To: References: Date: Fri, 26 Nov 2010 11:34:16 -0500 Message-ID: Subject: Re: Capacity problem with a lot of writes? From: Edward Capriolo To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Fri, Nov 26, 2010 at 10:49 AM, Peter Schuller wrote: >> Making compaction parallel isn't a priority because the problem is >> almost always the opposite: how do we spread it out over a longer >> period of time instead of sharp spikes of activity that hurt >> read/write latency. =A0I'd be very surprised if latency would be >> acceptable if you did have parallel compaction. =A0In other words, your >> real problem is you need more capacity for your workload. > > Do you expect this to be true even with the I/O situation improved > (i.e., under conditions where the additional I/O is not a problem)? It > seems counter-intuitive to me that single-core compaction would make a > huge impact on latency when compaction is CPU bound on a 8+ core > system under moderate load (even taking into account cache > coherency/NUMA etc). > > -- > / Peter Schuller > Carlos, I wanted to mention a specific technique I used to solve a situation I ran into. We had a large influx of data that pushed at our current hardware, as stated above the true answer was more hardware. However we ran into a situation where a single node failed several large compactions. We failed 2 or 3 big compactions we ended up with ~1000 SSTables for a column family. This turned into a chicken and egg situation where reads were slow because there were many sstables and extra data like tombstones. However the compaction was brutally slow from the read/write traffic. My solution was to create a side by side install on the same box, I used different data directories and different ports, /var/lib/cassandra/compact 9168 etc, moved the data to the new install and started it up. Then I ran nodetool compact on the new instance. This node was seeing no read or write traffic. I was surprised to see the machine was at 400%/1600% CPU used and not much io-wait. Compacting 600 GB of small SSTables took about 4 days. (However when sstables are larger I have compacted 400GB in 4 hours on the same hardware.) After which I moved the data file back in place and started the node back into the cluster. I have lived on both sides of the fence where i want long slow compactions or breakneck fast ones. I believe there is room for other compaction models. I am interested in systems that can optimize the case with multiple data directories for example. It seems like from my experiment a major compaction can not fully utilize hardware is specific conditions. Although knowing which ones to use where and how to automatically select the optimal strategy are interesting concerns.