cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alain RODRIGUEZ <arodr...@gmail.com>
Subject Re: Cassandra nodes reduce disks per node
Date Thu, 25 Feb 2016 12:47:54 GMT
You're welcome, if you have some feedback you can comment the blog post :-).

C*heers,
-----------------------
Alain Rodriguez - alain@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-02-25 12:28 GMT+01:00 Anishek Agarwal <anishek@gmail.com>:

> Nice thanks !
>
> On Thu, Feb 25, 2016 at 1:51 PM, Alain RODRIGUEZ <arodrime@gmail.com>
> wrote:
>
>> For what it is worth, I finally wrote a blog post about this -->
>> http://thelastpickle.com/blog/2016/02/25/removing-a-disk-mapping-from-cassandra.html
>>
>> If you are not done yet, every step is detailed in there.
>>
>> C*heers,
>> -----------------------
>> Alain Rodriguez - alain@thelastpickle.com
>> France
>>
>> The Last Pickle - Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> 2016-02-19 10:04 GMT+01:00 Alain RODRIGUEZ <arodrime@gmail.com>:
>>
>>> Alain, thanks for sharing!  I'm confused why you do so many repetitive
>>>> rsyncs.  Just being cautious or is there another reason?  Also, why do you
>>>> have --delete-before when you're copying data to a temp (assumed empty)
>>>> directory?
>>>
>>>
>>>  Since they are immutable I do a first sync while everything is up and
>>>> running to the new location which runs really long. Meanwhile new ones are
>>>> created and I sync them again online, much less files to copy now. After
>>>> that I shutdown the node and my last rsync now has to copy only a few files
>>>> which is quite fast and so the downtime for that node is within minutes.
>>>
>>>
>>> Jan guess is right. Except for the "immutable" thing. Compaction can
>>> make big files go away, replaced by bigger ones you'll have to stream again.
>>>
>>> Here is a detailed explanation about what I did it this way.
>>>
>>> More precisely, let's say we have 10 files of 100 GB on the disk to
>>> remove (let's say 'old-dir')
>>>
>>> I run a first rsync to an empty folder indeed (let's call this
>>> 'tmp-dir'), in the disk that will remain after the operation. Let's say
>>> this takes about 10 hours. This can be run in parallel though.
>>>
>>> So I now have 10 files of 10GB on the tmp-dir. But meanwhile one
>>> compaction triggered and I now have 6 files of 100 GB and 1 of 350 GB.
>>>
>>> At this point I disable compaction, stop running ones.
>>>
>>> My second rsync has to remove the 4 files that were compacted from
>>> tmp-dir, so that's why I use the '--delete-before'. As this tmp-dir
>>> needs to be mirroring old-dir, this is fine. This new operation takes 3.5
>>> hours, also runnable in parallel (Keep in mind C* won't compact anything
>>> for 3.5 hours, that's why I did not stopped compaction before the first
>>> rsync, in my case dataset was 2 TB big)
>>>
>>> At this point I have 950 GB in tmp-dir, but meanwhile clients continued
>>> to write on the disk. let's say 50 GB more.
>>>
>>> 3rd rsync will take 0.5 hour, no compaction ran, so I just have to add
>>> the diff to tmp-dir. Still runnable in parallel.
>>>
>>> Then the script stop the node, so should be run sequentially, and
>>> perform 2 more rsync, the first one to take the diff between end of 3rd
>>> rsync and the moment you stop the node, should be a few seconds, minutes
>>> maybe, depending how fast you ran the script after 3rd rsync ended. The
>>> second rsync in the script is a 'useless' one. I just like to control
>>> things. I run it, expect to see it to say that there is no diff. It is just
>>> a way to stop the script if for some reason data is still being appended to
>>> old-dir.
>>>
>>> Then I just move all the files from tmp-dir to new-dir (the proper data
>>> dir remaining after the operation). This is an instant op a files are not
>>> really moved as they already are on disk. That's due to system files
>>> property.
>>>
>>> I finally unmount and rm -rf old-dir.
>>>
>>> So the full op takes 10h + 3.5 h + 0.5h + (number of noodes * 0.1 h) and
>>> nodes are down for about 5-10 min.
>>>
>>> VS
>>>
>>> Straight forward way (stop node, move, start node) : 10 h * number of
>>> node as this needs to be sequential. Plus each node is down for 10 hours,
>>> you have to repair them as it is higher than hinted handoff limit...
>>>
>>> Branton, I did not went through your process, but I guess you will be
>>> able to review it by yourself after reading the above (typically, repair is
>>> not needed if you use the strategy I describe above, as node is down for
>>> 5-10 minutes). Also, not sure how "rsync -azvuiP
>>> /var/data/cassandra/data2/ /var/data/cassandra/data/" will behave, my guess
>>> i this is going to do a copy, so this might be very long. My script perform
>>> an instant move and as the next command is 'rm -Rf
>>> /var/data/cassandra/data2' I see no reason copying rather than moving files.
>>>
>>> Your solution would probably work, but with big constraints on
>>> operational point of view (very long operation + repair needed)
>>>
>>> Hope this long email will be useful, maybe should I blog about this. Let
>>> me know if the process above makes sense or if some things might be
>>> improved.
>>>
>>> C*heers,
>>> -----------------
>>> Alain Rodriguez
>>> France
>>>
>>> The Last Pickle
>>> http://www.thelastpickle.com
>>>
>>> 2016-02-19 7:19 GMT+01:00 Branton Davis <branton.davis@spanning.com>:
>>>
>>>> Jan, thanks!  That makes perfect sense to run a second time before
>>>> stopping cassandra.  I'll add that in when I do the production cluster.
>>>>
>>>> On Fri, Feb 19, 2016 at 12:16 AM, Jan Kesten <j.kesten@enercast.de>
>>>> wrote:
>>>>
>>>>> Hi Branton,
>>>>>
>>>>> two cents from me - I didnt look through the script, but for the
>>>>> rsyncs I do pretty much the same when moving them. Since they are immutable
>>>>> I do a first sync while everything is up and running to the new location
>>>>> which runs really long. Meanwhile new ones are created and I sync them
>>>>> again online, much less files to copy now. After that I shutdown the
node
>>>>> and my last rsync now has to copy only a few files which is quite fast
and
>>>>> so the downtime for that node is within minutes.
>>>>>
>>>>> Jan
>>>>>
>>>>>
>>>>>
>>>>> Von meinem iPhone gesendet
>>>>>
>>>>> Am 18.02.2016 um 22:12 schrieb Branton Davis <
>>>>> branton.davis@spanning.com>:
>>>>>
>>>>> Alain, thanks for sharing!  I'm confused why you do so many repetitive
>>>>> rsyncs.  Just being cautious or is there another reason?  Also, why do
you
>>>>> have --delete-before when you're copying data to a temp (assumed empty)
>>>>> directory?
>>>>>
>>>>> On Thu, Feb 18, 2016 at 4:12 AM, Alain RODRIGUEZ <arodrime@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I did the process a few weeks ago and ended up writing a runbook
and
>>>>>> a script. I have anonymised and share it fwiw.
>>>>>>
>>>>>> https://github.com/arodrime/cassandra-tools/tree/master/remove_disk
>>>>>>
>>>>>> It is basic bash. I tried to have the shortest down time possible,
>>>>>> making this a bit more complex, but it allows you to do a lot in
parallel
>>>>>> and just do a fast operation sequentially, reducing overall operation
time.
>>>>>>
>>>>>> This worked fine for me, yet I might have make some errors while
>>>>>> making it configurable though variables. Be sure to be around if
you decide
>>>>>> to run this. Also I automated this more by using knife (Chef), I
hate to
>>>>>> repeat ops, this is something you might want to consider.
>>>>>>
>>>>>> Hope this is useful,
>>>>>>
>>>>>> C*heers,
>>>>>> -----------------
>>>>>> Alain Rodriguez
>>>>>> France
>>>>>>
>>>>>> The Last Pickle
>>>>>> http://www.thelastpickle.com
>>>>>>
>>>>>> 2016-02-18 8:28 GMT+01:00 Anishek Agarwal <anishek@gmail.com>:
>>>>>>
>>>>>>> Hey Branton,
>>>>>>>
>>>>>>> Please do let us know if you face any problems  doing this.
>>>>>>>
>>>>>>> Thanks
>>>>>>> anishek
>>>>>>>
>>>>>>> On Thu, Feb 18, 2016 at 3:33 AM, Branton Davis <
>>>>>>> branton.davis@spanning.com> wrote:
>>>>>>>
>>>>>>>> We're about to do the same thing.  It shouldn't be necessary
to
>>>>>>>> shut down the entire cluster, right?
>>>>>>>>
>>>>>>>> On Wed, Feb 17, 2016 at 12:45 PM, Robert Coli <rcoli@eventbrite.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Feb 16, 2016 at 11:29 PM, Anishek Agarwal <
>>>>>>>>> anishek@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> To accomplish this can I just copy the data from
disk1 to disk2
>>>>>>>>>> with in the relevant cassandra home location folders,
change the
>>>>>>>>>> cassanda.yaml configuration and restart the node.
before starting i will
>>>>>>>>>> shutdown the cluster.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yes.
>>>>>>>>>
>>>>>>>>> =Rob
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message