incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Anti-compaction Diskspace issue even when latest patch applied
Date Thu, 04 Mar 2010 16:31:16 GMT
https://issues.apache.org/jira/browse/CASSANDRA-579 should make a big
difference in speed.  If you want to take a stab at it I can point you
in the right direction. :)

On Thu, Mar 4, 2010 at 10:24 AM, shiv shivaji <shivajisus@yahoo.com> wrote:
> Yes.
>
> The IP change trick seems to work. Load balancing seems a little slow, but I
> will open a new thread on that if needed.
>
> Thanks, Shiv
>
>
> ________________________________
> From: Jonathan Ellis <jbellis@gmail.com>
> To: cassandra-user@incubator.apache.org
> Sent: Wed, March 3, 2010 9:21:28 AM
> Subject: Re: Anti-compaction Diskspace issue even when latest patch applied
>
> You are proposing manually moving your data from a 5TB disk to a 12TB
> disk, and that is the only change you want to make?  Then just keep
> the IP the same when you restart it after moving, and you won't have
> to do anything else, it will just look like the node was down
> temporarily and is now back up.
>
> On Tue, Mar 2, 2010 at 7:26 PM, shiv shivaji <shivajisus@yahoo.com> wrote:
>> Thanks, just realized this after looking at the source code.
>>
>> Seems like the decommission will not work for me due to disk space issues.
>> I
>> am currently moving all the data on the heavy node (5 TB full) to a 12 TB
>> disk drive. I am planning to remove the old token and resign the old token
>> to this node.
>>
>> According to the docs, it says to use decommission, however lack of disk
>> space does not allow me to do this. If I manually move all the data files
>> and then do a removetoken and start the node with a new token, would that
>> work (as was implied in a JIRA)?
>>
>> Shiv
>>
>>
>> ________________________________
>> From: Stu Hood <stu.hood@rackspace.com>
>> To: cassandra-user@incubator.apache.org
>> Sent: Sun, February 28, 2010 1:53:29 PM
>> Subject: Re: Anti-compaction Diskspace issue even when latest patch
>> applied
>>
>> `nodetool cleanup` is a very expensive process: it performs a major
>> compaction, and should not be done that frequently.
>>
>> -----Original Message-----
>> From: "shiv shivaji" <shivajisus@yahoo.com>
>> Sent: Sunday, February 28, 2010 3:34pm
>> To: cassandra-user@incubator.apache.org
>> Subject: Re: Anti-compaction Diskspace issue even when latest patch
>> applied
>>
>> Seems like the temporary solution was to run a cron job which calls
>> nodetool
>> cleanup every 5 mins or so. This stopped the disk space from going too
>> low.
>>
>> The manual solution you mentioned is likely worthy of consideration as the
>> load balancing is taking a while.
>>
>> I will track the jira issue of anticompaction and diskspace. Thanks for
>> the
>> pointer.
>>
>>
>> Thanks, Shiv
>>
>>
>>
>>
>> ________________________________
>> From: Jonathan Ellis <jbellis@gmail.com>
>> To: cassandra-user@incubator.apache.org
>> Sent: Wed, February 24, 2010 11:34:59 AM
>> Subject: Re: Anti-compaction Diskspace issue even when latest patch
>> applied
>>
>> as you noticed, "nodeprobe move" first unloads the data, then moves to
>> the new position.  so that won't help you here.
>>
>> If you are using replicationfactor=1, scp the data to the previous
>> node on the ring, then reduce the original node's token so it isn't
>> responsible for so much, and run cleanup.  (you can do this w/ higher
>> RF too, you just have to scp the data more places.)
>>
>> Finally, you could work on
>> https://issues.apache.org/jira/browse/CASSANDRA-579 so it doesn't need
>> to anticompact to disk before moving data.
>>
>> -Jonathan
>>
>> On Wed, Feb 24, 2010 at 12:06 PM, shiv shivaji <shivajisus@yahoo.com>
>> wrote:
>>> According to the stack trace I get in the log, it makes it look like the
>>> patch was for anti-compaction but I did not look at the source code in
>>> detail yet.
>>>
>>> java.util.concurrent.ExecutionException:
>>> java.lang.UnsupportedOperationException: disk full
>>>        at
>>> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
>>>        at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>>>        at
>>>
>>>
>>> org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:86)
>>>        at
>>>
>>>
>>> org.apache.cassandra.db.CompactionManager$CompactionExecutor.afterExecute(CompactionManager.java:570)
>>>        at
>>>
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888)
>>>        at
>>>
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>>        at java.lang.Thread.run(Thread.java:619)
>>> Caused by: java.lang.UnsupportedOperationException: disk full
>>>        at
>>>
>>>
>>> org.apache.cassandra.db.CompactionManager.doAntiCompaction(CompactionManager.java:344)
>>>        at
>>>
>>>
>>> org.apache.cassandra.db.CompactionManager.doCleanupCompaction(CompactionManager.java:405)
>>>        at
>>>
>>>
>>> org.apache.cassandra.db.CompactionManager.access$400(CompactionManager.java:49)
>>>        at
>>>
>>>
>>> org.apache.cassandra.db.CompactionManager$2.call(CompactionManager.java:130)
>>>        at
>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>>        at
>>>
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>        ... 2 more
>>>
>>> I tried "nodetool cleanup" before and it did not really stop the disk
>>> from
>>> filling, is there a way to force move the data or some other way to solve
>>> the issue?
>>>
>>> Thanks, Shiv
>>>
>>> ________________________________
>>> From: Jonathan Ellis <jbellis@gmail.com>
>>> To: cassandra-user@incubator.apache.org
>>> Sent: Wed, February 24, 2010 7:16:32 AM
>>> Subject: Re: Anti-compaction Diskspace issue even when latest patch
>>> applied
>>>
>>> The patch you refer to was to help *compaction*, not *anticompaction*.
>>>
>>> If the space is mostly hints for other machines (is that what you
>>> meant by "due to past problems with others?") you should run nodeprobe
>>> cleanup on it to remove data that doesn't actually belong on that
>>> node.
>>>
>>> -Jonathan
>>>
>>> On Wed, Feb 24, 2010 at 3:09 AM, shiv shivaji <shivajisus@yahoo.com>
>>> wrote:
>>>> For about 6TB of  total data size with a replication factor of 2 (6TB x
>>>> 2)
>>>> on a five node cluster, I see about 4.6 TB on one machine (due to
>>>> potential
>>>> past problems with other machines). The machine has a disk of 6TB.
>>>>
>>>> The data folder on this machine has 59,289 files totally 4.6 TB. The
>>>> files
>>>> are the data, filter and indexes. I see that anti-compaction is running.
>>>> I
>>>> applied a recent patch which does not do anti-compaction if disk space
>>>> is
>>>> limited. I still see it happening. I have also called nodetool
>>>> loadbalance
>>>> on this machine. Seems like it will run out of disk space anyway.
>>>>
>>>> The machine diskspace consumed are: (Each machine has a 6TB hard-drive
>>>> on
>>>> RAID).
>>>>
>>>> Machine Space Consumed
>>>> M1    4.47 TB
>>>> M2    2.93 TB
>>>> M3    1.83 GB
>>>> M4    56.19 GB
>>>> M5    398.01 GB
>>>>
>>>> How can I force M1 to immediately move its load to M3 and M4 for
>>>> instance
>>>> (or any other machine). The nodetool move command moves all data, is
>>>> there
>>>> a
>>>> way instead to force move 50% of data to M3 and the remaining 50% to M4
>>>> and
>>>> resume anti-compaction after the move?
>>>>
>>>> Thanks, Shiv
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>

Mime
View raw message