incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julien Campan <julien.cam...@gmail.com>
Subject Re: Nodetool cleanup
Date Fri, 29 Nov 2013 09:55:21 GMT
Thanks a lot for yours answers.




2013/11/29 John Sanda <john.sanda@gmail.com>

> Couldn't another reason for doing cleanup sequentially be to avoid data
> loss? If data is being streamed from a node during bootstrap and cleanup is
> run too soon, couldn't you wind up in a situation with data loss if the new
> node being bootstrapped goes down (permanently)?
>
>
> On Thu, Nov 28, 2013 at 8:59 PM, Aaron Morton <aaron@thelastpickle.com>wrote:
>
>> I hope I get this right :)
>>
>> Thanks for contributing :)
>>
>> a repair will trigger a mayor compaction on your node which will take up
>> a lot of CPU and IO performance. It needs to do this to build up the data
>> structure that is used for the repair. After the compaction this is
>> streamed to the different nodes in order to repair them.
>>
>> It does not trigger a major compaction, that’s what we call running
>> compaction on the command line and compacting all SSTables into one big
>> one.
>>
>> it will flush all the data to disk that will create some additional
>> compaction.
>>
>> The major concern is that s a disk IO intensive operation, it reads all
>> the data and writes data to new SSTables (a one to one mapping). If you
>> have all nodes doing this at the same time there may be some degraded
>> performance. And as it’s all nodes it’s not possible for the Dynamic Snitch
>> to avoid nodes if they are overloaded.
>>
>> Cleanup is less intensive than repair, but it’s still a good idea to
>> stagger it. If you need to run it on all machines (or you have very
>> powerful machines) it’s probably going to be OK.
>>
>> Hope that helps.
>>
>>  -----------------
>> Aaron Morton
>> New Zealand
>> @aaronmorton
>>
>> Co-Founder & Principal Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> On 26/11/2013, at 5:14 am, Artur Kronenberg <
>> artur.kronenberg@openmarket.com> wrote:
>>
>>  Hi Julien,
>>
>> I hope I get this right :)
>>
>> a repair will trigger a mayor compaction on your node which will take up
>> a lot of CPU and IO performance. It needs to do this to build up the data
>> structure that is used for the repair. After the compaction this is
>> streamed to the different nodes in order to repair them.
>>
>> If you trigger this on every node simultaneously you basically take the
>> performance away from your cluster. I would expect cassandra still to
>> function, just way slower then before. Triggering it node after node will
>> leave your cluster with more resources to handle incoming requests.
>>
>>
>> Cheers,
>>
>> Artur
>> On 25/11/13 15:12, Julien Campan wrote:
>>
>>   Hi,
>>
>>  I'm working with Cassandra 1.2.2 and I have a question about nodetool
>> cleanup.
>>  In the documentation , it's writted " Wait for cleanup to complete on
>> one node before doing the next"
>>
>>  I would like to know, why we can't perform a lot of cleanup in a same
>> time ?
>>
>>
>>  Thanks
>>
>>
>>
>>
>>
>
>
> --
>
> - John
>

Mime
View raw message