Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 5281 invoked from network); 1 Mar 2010 08:42:06 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 1 Mar 2010 08:42:06 -0000 Received: (qmail 20307 invoked by uid 500); 28 Feb 2010 21:35:26 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 20297 invoked by uid 500); 28 Feb 2010 21:35:26 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 20289 invoked by uid 99); 28 Feb 2010 21:35:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 28 Feb 2010 21:35:26 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [68.142.206.168] (HELO web31815.mail.mud.yahoo.com) (68.142.206.168) by apache.org (qpsmtpd/0.29) with SMTP; Sun, 28 Feb 2010 21:35:16 +0000 Received: (qmail 35595 invoked by uid 60001); 28 Feb 2010 21:34:54 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1267392894; bh=AhU1JiuyAOgkcLeswGkOxlQhcfx5YQCyhC2V/dxMzUY=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=L//MiOmdc8x0Sn9LGzvCUpzd5JeRWWnWkZ1+yAdrqScG9jLc3G0JKF5CpQJOX4XoUFFG5hcu4qamYe09yA6yOeW3ttftOwiy33wy0tJ3HtUNadrmrwAnf+FE/P7w+6RTJAEMWbhwT8ghqLKAyuPYpGNQ3p9YS4LB0+7qvpd2n+4= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=x1C/7/JyHF/s56jLmoiKw0UckCl+qLNRGWC7ArqjAXLhMpL/rK4LPZSBsAOYQWfePwaMEXdKOP5zdzPX15beiJP21x/qanWfjzypEp7Ay9GldcqXZfoBRO4/cRIiMMpiOlCyhoqzv8Buf1d3mbuC0NPqobEErMFOfe/YkibQWDs=; Message-ID: <371345.34010.qm@web31815.mail.mud.yahoo.com> X-YMail-OSG: C8pvoSsVM1mD3fPojEmA5lav8uw2nBWIux8Maqb0RJKezoe B9QIHzsHy_oWooPWHME5lQ1Xmj0lVxk4jdoOy.VfxA8od.SVHGMLvvHanyq7 YWuoh0hx_1KEDGBh1CLAlO03wb.OPQiXi7q2FOM6EuId973g.DlGYXxK6Nys WgXfbP52Oxg2kIENCpnpwVsrw9ZlCXV0JCdEeynqjKye8Gl3HH1Rhi.wMMIj 571x07bjoqe8pbrX7Q9kOw6a8BiES99S6g2GQs4.3TmwtV9tk4v_j15CR.gG A7_x8YMS2RJX7UBWu38B4Y4felkQPvhvhFrj9s3_dUsKZWaCg5uzm7Pke_t0 IbsUf6et_ Received: from [68.165.4.240] by web31815.mail.mud.yahoo.com via HTTP; Sun, 28 Feb 2010 13:34:54 PST X-Mailer: YahooMailRC/300.3 YahooMailWebService/0.8.102.267879 References: <209311.76109.qm@web31812.mail.mud.yahoo.com> <906294.96297.qm@web31804.mail.mud.yahoo.com> Date: Sun, 28 Feb 2010 13:34:54 -0800 (PST) From: shiv shivaji Subject: Re: Anti-compaction Diskspace issue even when latest patch applied To: cassandra-user@incubator.apache.org In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="0-1361470926-1267392894=:34010" X-Virus-Checked: Checked by ClamAV on apache.org --0-1361470926-1267392894=:34010 Content-Type: text/plain; charset=us-ascii Seems like the temporary solution was to run a cron job which calls nodetool cleanup every 5 mins or so. This stopped the disk space from going too low. The manual solution you mentioned is likely worthy of consideration as the load balancing is taking a while. I will track the jira issue of anticompaction and diskspace. Thanks for the pointer. Thanks, Shiv ________________________________ From: Jonathan Ellis To: cassandra-user@incubator.apache.org Sent: Wed, February 24, 2010 11:34:59 AM Subject: Re: Anti-compaction Diskspace issue even when latest patch applied as you noticed, "nodeprobe move" first unloads the data, then moves to the new position. so that won't help you here. If you are using replicationfactor=1, scp the data to the previous node on the ring, then reduce the original node's token so it isn't responsible for so much, and run cleanup. (you can do this w/ higher RF too, you just have to scp the data more places.) Finally, you could work on https://issues.apache.org/jira/browse/CASSANDRA-579 so it doesn't need to anticompact to disk before moving data. -Jonathan On Wed, Feb 24, 2010 at 12:06 PM, shiv shivaji wrote: > According to the stack trace I get in the log, it makes it look like the > patch was for anti-compaction but I did not look at the source code in > detail yet. > > java.util.concurrent.ExecutionException: > java.lang.UnsupportedOperationException: disk full > at > java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) > at java.util.concurrent.FutureTask.get(FutureTask.java:83) > at > org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86) > at > org.apache.cassandra.db.CompactionManager$CompactionExecutor.afterExecute(CompactionManager.java:570) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.UnsupportedOperationException: disk full > at > org.apache.cassandra.db.CompactionManager.doAntiCompaction(CompactionManager.java:344) > at > org.apache.cassandra.db.CompactionManager.doCleanupCompaction(CompactionManager.java:405) > at > org.apache.cassandra.db.CompactionManager.access$400(CompactionManager.java:49) > at > org.apache.cassandra.db.CompactionManager$2.call(CompactionManager.java:130) > at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > ... 2 more > > I tried "nodetool cleanup" before and it did not really stop the disk from > filling, is there a way to force move the data or some other way to solve > the issue? > > Thanks, Shiv > > ________________________________ > From: Jonathan Ellis > To: cassandra-user@incubator.apache.org > Sent: Wed, February 24, 2010 7:16:32 AM > Subject: Re: Anti-compaction Diskspace issue even when latest patch applied > > The patch you refer to was to help *compaction*, not *anticompaction*. > > If the space is mostly hints for other machines (is that what you > meant by "due to past problems with others?") you should run nodeprobe > cleanup on it to remove data that doesn't actually belong on that > node. > > -Jonathan > > On Wed, Feb 24, 2010 at 3:09 AM, shiv shivaji wrote: >> For about 6TB of total data size with a replication factor of 2 (6TB x 2) >> on a five node cluster, I see about 4.6 TB on one machine (due to >> potential >> past problems with other machines). The machine has a disk of 6TB. >> >> The data folder on this machine has 59,289 files totally 4.6 TB. The files >> are the data, filter and indexes. I see that anti-compaction is running. I >> applied a recent patch which does not do anti-compaction if disk space is >> limited. I still see it happening. I have also called nodetool loadbalance >> on this machine. Seems like it will run out of disk space anyway. >> >> The machine diskspace consumed are: (Each machine has a 6TB hard-drive on >> RAID). >> >> Machine Space Consumed >> M1 4.47 TB >> M2 2.93 TB >> M3 1.83 GB >> M4 56.19 GB >> M5 398.01 GB >> >> How can I force M1 to immediately move its load to M3 and M4 for instance >> (or any other machine). The nodetool move command moves all data, is there >> a >> way instead to force move 50% of data to M3 and the remaining 50% to M4 >> and >> resume anti-compaction after the move? >> >> Thanks, Shiv >> >> >> > --0-1361470926-1267392894=:34010 Content-Type: text/html; charset=us-ascii
Seems like the temporary solution was to run a cron job which calls nodetool cleanup every 5 mins or so. This stopped the disk space from going too low.

The manual solution you mentioned is likely worthy of consideration as the load balancing is taking a while.

I will track the jira issue of anticompaction and diskspace. Thanks for the pointer.

Thanks, Shiv


From: Jonathan Ellis <jbellis@gmail.com>
To: cassandra-user@incubator.apache.org
Sent: Wed, February 24, 2010 11:34:59 AM
Subject: Re: Anti-compaction Diskspace issue even when latest patch applied

as you noticed, "nodeprobe move" first unloads the data, then moves to
the new position.  so that won't help you here.

If you are using replicationfactor=1, scp the data to the previous
node on the ring, then reduce the original node's token so it isn't
responsible for so much, and run cleanup.  (you can do this w/ higher
RF too, you just have to scp the data more places.)

Finally, you could work on
https://issues.apache.org/jira/browse/CASSANDRA-579 so it doesn't need
to anticompact to disk before moving data.

-Jonathan

On Wed, Feb 24, 2010 at 12:06 PM, shiv shivaji <shivajisus@yahoo.com> wrote:
> According to the stack trace I get in the log, it makes it look like the
> patch was for anti-compaction but I did not look at the source code in
> detail yet.
>
> java.util.concurrent.ExecutionException:
> java.lang.UnsupportedOperationException: disk full
>         at
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>         at
> org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86)
>         at
> org.apache.cassandra.db.CompactionManager$CompactionExecutor.afterExecute(CompactionManager.java:570)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.UnsupportedOperationException: disk full
>         at
> org.apache.cassandra.db.CompactionManager.doAntiCompaction(CompactionManager.java:344)
>         at
> org.apache.cassandra.db.CompactionManager.doCleanupCompaction(CompactionManager.java:405)
>         at
> org.apache.cassandra.db.CompactionManager.access$400(CompactionManager.java:49)
>         at
> org.apache.cassandra.db.CompactionManager$2.call(CompactionManager.java:130)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         ... 2 more
>
> I tried "nodetool cleanup" before and it did not really stop the disk from
> filling, is there a way to force move the data or some other way to solve
> the issue?
>
> Thanks, Shiv
>
> ________________________________
> From: Jonathan Ellis <jbellis@gmail.com>
> To: cassandra-user@incubator.apache.org
> Sent: Wed, February 24, 2010 7:16:32 AM
> Subject: Re: Anti-compaction Diskspace issue even when latest patch applied
>
> The patch you refer to was to help *compaction*, not *anticompaction*.
>
> If the space is mostly hints for other machines (is that what you
> meant by "due to past problems with others?") you should run nodeprobe
> cleanup on it to remove data that doesn't actually belong on that
> node.
>
> -Jonathan
>
> On Wed, Feb 24, 2010 at 3:09 AM, shiv shivaji <shivajisus@yahoo.com> wrote:
>> For about 6TB of  total data size with a replication factor of 2 (6TB x 2)
>> on a five node cluster, I see about 4.6 TB on one machine (due to
>> potential
>> past problems with other machines). The machine has a disk of 6TB.
>>
>> The data folder on this machine has 59,289 files totally 4.6 TB. The files
>> are the data, filter and indexes. I see that anti-compaction is running. I
>> applied a recent patch which does not do anti-compaction if disk space is
>> limited. I still see it happening. I have also called nodetool loadbalance
>> on this machine. Seems like it will run out of disk space anyway.
>>
>> The machine diskspace consumed are: (Each machine has a 6TB hard-drive on
>> RAID).
>>
>> Machine Space Consumed
>> M1    4.47 TB
>> M2    2.93 TB
>> M3    1.83 GB
>> M4    56.19 GB
>> M5    398.01 GB
>>
>> How can I force M1 to immediately move its load to M3 and M4 for instance
>> (or any other machine). The nodetool move command moves all data, is there
>> a
>> way instead to force move 50% of data to M3 and the remaining 50% to M4
>> and
>> resume anti-compaction after the move?
>>
>> Thanks, Shiv
>>
>>
>>
>
--0-1361470926-1267392894=:34010--