Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CA84B99A3 for ; Sun, 25 Sep 2011 03:34:26 +0000 (UTC) Received: (qmail 12893 invoked by uid 500); 25 Sep 2011 03:34:24 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 12651 invoked by uid 500); 25 Sep 2011 03:34:24 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 12622 invoked by uid 99); 25 Sep 2011 03:34:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 25 Sep 2011 03:34:23 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a59.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 25 Sep 2011 03:34:14 +0000 Received: from homiemail-a59.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a59.g.dreamhost.com (Postfix) with ESMTP id 0D956564058 for ; Sat, 24 Sep 2011 20:33:52 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=p6qhbaZHa7 BwkM37YbNbawq+2yY6Q4OfUkEZP2qqqhhRwr0z0Wq9j5in5MpjzLW6P5tNg60xxu AJHPLz5mgmi6FjVrla80ZAHMPYTeCPfBROQOrkP9aTcsaglOEiFu/EGDvg1D0RB7 Z3/Aulm7lC+FEA215JbCbec3e33mHbOFg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=tOekVqoXscj7c+Jp t2tJ9RYANxg=; b=TQ0vbvXxLt0WyA2mNjas9MCG1pdxAWzDXaArS7eF6bNezDSR 4Gy4cYyA5PGIfgzWuaYysrRRs+nIC4xIeIGy0TiVDXb4RF8+Oj809BhVaA3Ek4W+ p0pOsVqDZE+O61GB3AILjy9TMaQ3+NhDtl+T9HYprmAVdjyK+SJSIaSLb8c= Received: from [172.16.1.4] (219-89-1-137.dialup.xtra.co.nz [219.89.1.137]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a59.g.dreamhost.com (Postfix) with ESMTPSA id 54077564057 for ; Sat, 24 Sep 2011 20:33:51 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1244.3) Content-Type: multipart/alternative; boundary="Apple-Mail=_932943AF-AA01-4C57-AF6C-69E440249343" Subject: Re: Moving to a new cluster Date: Sun, 25 Sep 2011 16:33:48 +1300 In-Reply-To: To: user@cassandra.apache.org References: <901040CC-C1B5-419D-9780-A2D0A00A0DBC@thelastpickle.com> Message-Id: X-Mailer: Apple Mail (2.1244.3) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_932943AF-AA01-4C57-AF6C-69E440249343 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 It can result in a lot of data on the node you run repair on. Where a = lot means perhaps 2 or more times more data. My unscientific approach is to repair one CF at a time so you can watch = the disk usage and repair the smaller CF's first. After the repair = compact if you need to.=20 I think the amount of extra data will be related to how out of sync = things are, so once you get repair working smoothly it will be less of = problem. Cheers =20 ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 23/09/2011, at 3:04 AM, Yan Chunlu wrote: >=20 > hi Aaron: >=20 > could you explain more about the issue about repair make space usage = going crazy? >=20 > I am planning to upgrade my cluster from 0.7.4 to 0.8.6, which is = because the repair never works on 0.7.4 for me. > more specifically, CASSANDRA-2280 and CASSANDRA-2156. >=20 >=20 > from your description, I really worried about 0.8.6 might make it = worse... >=20 > thanks! >=20 > On Thu, Sep 22, 2011 at 7:25 AM, aaron morton = wrote: > How much data is on the nodes in cluster 1 and how much disk space on = cluster 2 ? Be aware that Cassandra 0.8 has an issue where repair can go = crazy and use a lot of space.=20 >=20 > If you are not regularly running repair I would also repair before the = move. >=20 > The repair after the copy is a good idea but should technically not be = necessary. If you can practice the move watch the repair to see if much = is transferred (check the logs). There is always a small transfer, but = if you see data been transferred for several minutes I would = investigate.=20 >=20 > When you start a repair it will repair will the other nodes it = replicates data with. So you only need to run it every RF nodes. Start = it one one, watch the logs to see who it talks to and then start it on = the first node it does not talk to. And so on.=20 >=20 > Add a snapshot before the clean (repair will also snapshot before it = runs) >=20 > Scrub is not needed unless you are migrating or you have file errors. >=20 > If your cluster is online, consider running the clean every RFth node = rather than all at once (e.g. 1,4, 7, 10 then 2,5,8,11). It will have = less impact on clients.=20 >=20 > Cheers >=20 > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com >=20 > On 22/09/2011, at 10:27 AM, Philippe wrote: >=20 >> Hello, >> We're currently running on a 3-node RF=3D3 cluster. Now that we have = a better grip on things, we want to replace it with a 12-node RF=3D3 = cluster of "smaller" servers. So I wonder what the best way to move the = data to the new cluster would be. I can afford to stop writing to the = current cluster for whatever time is necessary. Has anyone written up = something on this subject ? >>=20 >> My plan is the following (nodes in cluster 1 are node1.1->1.3, nodes = in cluster 2 are node2.1->2.12) >> stop writing to current cluster & drain it >> get a snapshot on each node >> Since it's RF=3D3, each node should have all the data, so assuming I = set the tokens correctly I would move the snapshot from node1.1 to = node2.1, 2.2, 2.3 and 2.4 then node1.2->node2.5,2.6,2.,2.8, etc. This is = because the range for node1.1 is now spread across 2.1->2.4 >> Run repair & clean & scrub on each node (more or less in //) >> What do you think ? >> Thanks >=20 >=20 --Apple-Mail=_932943AF-AA01-4C57-AF6C-69E440249343 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 It = can result in a lot of data on the node you run repair on. Where a lot = means perhaps 2 or more  times more data.

My = unscientific approach is to repair one CF at a time so you can watch the = disk usage and repair the smaller CF's first. After the repair compact = if you need to. 

I think  the amount = of extra data will be related to how out of sync things are, so once you = get repair working smoothly it will be less of = problem.

Cheers
   =  

http://www.thelastpickle.com

On 23/09/2011, at 3:04 AM, Yan Chunlu wrote:


hi = Aaron:

could you explain more about the issue about = repair make space usage going crazy?

I am = planning to upgrade my cluster from 0.7.4 to 0.8.6, which is because the = repair never works on 0.7.4 for me.
more specifically, CASSANDRA-2280 and CASSANDRA-2156.


from your description, I really = worried about 0.8.6 might make it = worse...

thanks!

On Thu, Sep 22, 2011 at 7:25 AM, aaron morton = <aaron@thelastpickle.com> wrote:
How much data is on the nodes in cluster = 1 and how much disk space on cluster 2 ? Be aware that Cassandra 0.8 has = an issue where repair can go crazy and use a lot of space. 

If you are not regularly running repair I would also = repair before the move.

The repair after the = copy is a good idea but should technically not be necessary. If you can = practice the move watch the repair to see if much is transferred (check = the logs). There is always a small transfer, but if you see data been = transferred for several minutes I would investigate. 

When you start a repair it will repair will the = other nodes it replicates data with. So you only need to run it every RF = nodes. Start it one one, watch the logs to see who it talks to and then = start it on the first node it does not talk to. And so on. 

Add a snapshot before the clean (repair will also = snapshot before it runs)

Scrub is not needed = unless you are migrating or you have file = errors.

If your cluster is online, consider = running the clean every RFth node rather than all at once (e.g. 1,4, 7, = 10 then 2,5,8,11). It will have less impact on clients. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra = Developer
@aaronmorton

On 22/09/2011, at 10:27 AM, Philippe = wrote:

Hello,
We're currently = running on a 3-node RF=3D3 cluster. Now that we have a better grip on = things, we want to replace it with a 12-node RF=3D3 cluster of "smaller" = servers. So I wonder what the best way to move the data to the new = cluster would be. I can afford to stop writing to the current cluster = for whatever time is necessary. Has anyone written up something on this = subject ?

My plan is the following (nodes in cluster 1 are node1.1->1.3, = nodes in cluster 2 are node2.1->2.12)
  • stop writing to = current cluster & drain it
  • get a snapshot on each = node
  • Since it's RF=3D3, each node should have all the data, so = assuming I set the tokens correctly I would move the snapshot from = node1.1 to node2.1, 2.2, 2.3 and 2.4 then = node1.2->node2.5,2.6,2.,2.8, etc. This is because the range for = node1.1 is now spread across 2.1->2.4
  • Run repair & clean & scrub on each node (more or less in = //)
What do you think ?
Thanks
=



= --Apple-Mail=_932943AF-AA01-4C57-AF6C-69E440249343--