Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7B04910BD for ; Fri, 22 Apr 2011 04:57:11 +0000 (UTC) Received: (qmail 74618 invoked by uid 500); 22 Apr 2011 04:57:09 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 74559 invoked by uid 500); 22 Apr 2011 04:57:09 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 74491 invoked by uid 99); 22 Apr 2011 04:57:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Apr 2011 04:57:07 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jbellis@gmail.com designates 209.85.212.44 as permitted sender) Received: from [209.85.212.44] (HELO mail-vw0-f44.google.com) (209.85.212.44) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Apr 2011 04:57:02 +0000 Received: by vws12 with SMTP id 12so324169vws.31 for ; Thu, 21 Apr 2011 21:56:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type:content-transfer-encoding; bh=nRytyOZxiuDLDcZSMMxbYZ+OkZ96fqzi4/55Xh1FB20=; b=j7fiDIErwzxHhekSPUKXCFTYjQ9yiQzmie3jLjKYRSABeOwuDhUjB4exlns+IxJkfW TCr2/mfqDZox4GuVq5cXvEenmTfKRz+k1g07KhAzRIdrUYKSmXpuBcwwgICqE7M5oa5a 0wiNBp9mIvwQMBum3kRx2wJwL5xn8GvOxOrlw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=mXVoxTp76kOidzDJ3Ap4+bPJiLFc8q78BXuGbB6Hgy+bGjLekzZ9TutNAGQTqSDwTq 4D2qO7ozwJMquXeu7ntNa6DpXk7v7EG8xU/7dkraGh6mZEBrol7yhSfNAvTsVB+ddRKq pTUYRY0whvr/iNlThIf8jNgBT8+7A4M61zklY= Received: by 10.52.94.48 with SMTP id cz16mr1043780vdb.173.1303448201054; Thu, 21 Apr 2011 21:56:41 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.185.6 with HTTP; Thu, 21 Apr 2011 21:56:21 -0700 (PDT) In-Reply-To: References: <44820098-E25A-4BFA-83EF-0DBADC8B2DC6@thelastpickle.com> From: Jonathan Ellis Date: Thu, 21 Apr 2011 23:56:21 -0500 Message-ID: Subject: Re: Compacting single file forever To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I suggest as a workaround making the forceUserDefinedCompaction method ignore disk space estimates and attempt the requested compaction even if it guesses it will not have enough space. This would allow you to submit the 2 sstables you want manually. On Thu, Apr 21, 2011 at 8:34 AM, Shotaro Kamio wrote: > Hi Aaron, > > > Maybe, my previous description was not good. It's not a compaction > threshold problem. > In fact, Cassandra tries to compact 7 sstables in the minor > compaction. But it decreases the number of sstables one by one due to > insufficient disk space. At the end, it compacts a single file as in > the new log below. > > Compactionstats on a node says: > > =A0compaction type: Minor > =A0column family: foobar > =A0bytes compacted: 133473101929 > =A0bytes total in progress: 170000743825 > =A0pending tasks: 12 > > The disk usage reaches 78%. It's really tough situation. But I guess > the data contains a lot of duplicates. because we feed same data again > and again and do repair. > > > Another thing I'm wondering is a file selection algorithm. > For example, one of disks has 235G free space. It contains sstables of > 61G, 159G, 191G, 196G, 197G. The one cassandra trying to compact > forever is 159G sstable. But there is smaller sstable. It should try > compacting 61G + 159G ideally. > A more intelligent algorithm is required to find optimal combination. > And if cassandra knows statistics about number of deleted data and old > data to be compacted for sstables, it should be useful to find more > efficient file combination. > > > Regards, > Shotaro > > > > * Minor compaction log > ----- > =A0WARN [CompactionExecutor:1] 2011-04-21 21:44:08,554 > CompactionManager.java (line 405) insufficient space to compact all > requested files SSTableReader(path=3D'foobar-f-773-Data.db'), > SSTableReader(path=3D'foobar-f-1452-Data.db'), > SSTableReader(path=3D'foobar-f-1620-Data.db'), > SSTableReader(path=3D'foobar-f-1642-Data.db'), > SSTableReader(path=3D'foobar-f-1643-Data.db'), > SSTableReader(path=3D'foobar-f-1690-Data.db'), > SSTableReader(path=3D'foobar-f-1814-Data.db') > =A0WARN [CompactionExecutor:1] 2011-04-21 21:44:28,565 > CompactionManager.java (line 405) insufficient space to compact all > requested files SSTableReader(path=3D'foobar-f-773-Data.db'), > SSTableReader(path=3D'foobar-f-1452-Data.db'), > SSTableReader(path=3D'foobar-f-1642-Data.db'), > SSTableReader(path=3D'foobar-f-1643-Data.db'), > SSTableReader(path=3D'foobar-f-1690-Data.db'), > SSTableReader(path=3D'foobar-f-1814-Data.db') > =A0WARN [CompactionExecutor:1] 2011-04-21 21:44:48,576 > CompactionManager.java (line 405) insufficient space to compact all > requested files SSTableReader(path=3D'foobar-f-773-Data.db'), > SSTableReader(path=3D'foobar-f-1452-Data.db'), > SSTableReader(path=3D'foobar-f-1642-Data.db'), > SSTableReader(path=3D'foobar-f-1643-Data.db'), > SSTableReader(path=3D'foobar-f-1814-Data.db') > =A0WARN [CompactionExecutor:1] 2011-04-21 21:45:08,586 > CompactionManager.java (line 405) insufficient space to compact all > requested files SSTableReader(path=3D'foobar-f-1452-Data.db'), > SSTableReader(path=3D'foobar-f-1642-Data.db'), > SSTableReader(path=3D'foobar-f-1643-Data.db'), > SSTableReader(path=3D'foobar-f-1814-Data.db') > =A0WARN [CompactionExecutor:1] 2011-04-21 21:45:28,596 > CompactionManager.java (line 405) insufficient space to compact all > requested files SSTableReader(path=3D'foobar-f-1642-Data.db'), > SSTableReader(path=3D'foobar-f-1643-Data.db'), > SSTableReader(path=3D'foobar-f-1814-Data.db') > =A0WARN [CompactionExecutor:1] 2011-04-21 21:45:48,607 > CompactionManager.java (line 405) insufficient space to compact all > requested files SSTableReader(path=3D'foobar-f-1642-Data.db'), > SSTableReader(path=3D'foobar-f-1814-Data.db') > ------ > > > > On Thu, Apr 21, 2011 at 7:20 PM, aaron morton w= rote: >> Want to check if you are talking about minor compactions or major (nodet= ool) >> compactions. >> What settings compaction settings do you have for this CF ? You can incr= ease >> the min compaction threshold and reduce the frequency of >> compactions=A0http://wiki.apache.org/cassandra/StorageConfiguration >> It seems like compaction is running continually, are their pending tasks= in >> the o.a.c.db.CompactionManager MBean ? >> How bad is you disk space problem ? >> For the code change, AFAIK it's not possible for cassandra to know if th= ere >> are tombstones in the SSTable which can be purged until the rows are rea= d. >> Perhaps the file could hold the earliest deleted at time somewhere (same= for >> TTL), but I do not think we do that now. >> Hope that helps. >> Aaron >> >> On 20 Apr 2011, at 21:25, Shotaro Kamio wrote: >> >> Hi, >> >> I found that our cluster repeats compacting a single file forever >> (cassandra 0.7.5). We are wondering if compaction logic is wrong. I'd >> like to have comments from you guys. >> >> Situation: >> - After trying to repair a column family, our cluster's disk usage is >> quite high. Cassandra cannot compact all sstables at once. I think it >> repeats compacting single file at the end. (you can check the attached >> log below) >> - Our data doesn't have deletes. So, the compaction of single file >> doesn't make free disk space. >> >> We are approaching to full-disk. But I believe that the repair >> operation made a lot of duplicate data on the disk and it requires >> compaction. However, most of nodes stuck on compacting a single file. >> The only thing we can do is to restart the nodes. >> >> My question is why the compaction doesn't stop. >> >> I looked at the logic in CompactionManager.java: >> ----------------- >> =A0=A0=A0=A0=A0=A0=A0String compactionFileLocation =3D >> table.getDataFileLocation(cfs.getExpectedCompactedFileSize(sstables)); >> =A0=A0=A0=A0=A0=A0=A0// If the compaction file path is null that means w= e have no >> space left for this compaction. >> =A0=A0=A0=A0=A0=A0=A0// try again w/o the largest one. >> =A0=A0=A0=A0=A0=A0=A0List smallerSSTables =3D new >> ArrayList(sstables); >> =A0=A0=A0=A0=A0=A0=A0while (compactionFileLocation =3D=3D null && smalle= rSSTables.size() > 1) >> =A0=A0=A0=A0=A0=A0=A0{ >> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0logger.warn("insufficient space to comp= act all requested >> files " + StringUtils.join(smallerSSTables, ", ")); >> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0smallerSSTables.remove(cfs.getMaxSizeFi= le(smallerSSTables)); >> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0compactionFileLocation =3D >> table.getDataFileLocation(cfs.getExpectedCompactedFileSize(smallerSSTabl= es)); >> =A0=A0=A0=A0=A0=A0=A0} >> =A0=A0=A0=A0=A0=A0=A0if (compactionFileLocation =3D=3D null) >> =A0=A0=A0=A0=A0=A0=A0{ >> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0logger.error("insufficient space to com= pact even the two >> smallest files, aborting"); >> =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0return 0; >> =A0=A0=A0=A0=A0=A0=A0} >> ----------------- >> >> The while condition: smallerSSTables.size() > 1 >> Is this should be "smallerSSTables.size() > 2" ? >> >> In my understanding, compaction of single file makes free disk space >> only when the sstable has a lot of tombstone and only if the tombstone >> is removed in the compaction. If cassandra knows the sstable has >> tombstones to be removed, it's worth to compact it. Otherwise, it >> might makes free space if you are lucky. In worst case, it leads to >> infinite loop like our case. >> >> What do you think the code change? >> >> >> Best regards, >> Shotaro >> >> >> * Cassandra compaction log >> ------------------------- >> WARN [CompactionExecutor:1] 2011-04-20 01:03:14,446 >> CompactionManager.java (line 405) insufficient space to compact all >> requested files SSTableReader( >> path=3D'foobar-f-3020-Data.db'), SSTableReader(path=3D'foobar-f-3034-Dat= a.db') >> INFO [CompactionExecutor:1] 2011-04-20 03:47:29,833 >> CompactionManager.java (line 482) Compacted to >> foobar-tmp-f-3035-Data.db. =A0260,646,760,319 to 260,646,760,319 (~100% >> of original) bytes for 6,893,896 keys. =A0Time: 9,855,385ms. >> >> WARN [CompactionExecutor:1] 2011-04-20 03:48:11,308 >> CompactionManager.java (line 405) insufficient space to compact all >> requested files SSTableReader(path=3D'foobar-f-3020-Data.db'), >> SSTableReader(path=3D'foobar-f-3035-Data.db') >> INFO [CompactionExecutor:1] 2011-04-20 06:31:41,193 >> CompactionManager.java (line 482) Compacted to >> foobar-tmp-f-3036-Data.db. =A0260,646,760,319 to 260,646,760,319 (~100% >> of original) bytes for 6,893,896 keys. =A0Time: 9,809,882ms. >> >> WARN [CompactionExecutor:1] 2011-04-20 06:32:22,476 >> CompactionManager.java (line 405) insufficient space to compact all >> requested files SSTableReader(path=3D'foobar-f-3020-Data.db'), >> SSTableReader(path=3D'foobar-f-3036-Data.db') >> INFO [CompactionExecutor:1] 2011-04-20 09:20:29,903 >> CompactionManager.java (line 482) Compacted to >> foobar-tmp-f-3037-Data.db. =A0260,646,760,319 to 260,646,760,319 (~100% >> of original) bytes for 6,893,896 keys. =A0Time: 10,087,424ms. >> ------------------------- >> You can see that compacted size is always the same. It repeats >> compacting the same single sstable. >> >> > > > > -- > Shotaro Kamio > --=20 Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com