Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 52A6B9095 for ; Sun, 25 Sep 2011 09:22:11 +0000 (UTC) Received: (qmail 81397 invoked by uid 500); 25 Sep 2011 09:22:09 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 81373 invoked by uid 500); 25 Sep 2011 09:22:09 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 81364 invoked by uid 99); 25 Sep 2011 09:22:09 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 25 Sep 2011 09:22:09 +0000 X-ASF-Spam-Status: No, hits=3.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a47.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 25 Sep 2011 09:22:01 +0000 Received: from homiemail-a47.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a47.g.dreamhost.com (Postfix) with ESMTP id 07D3728405B for ; Sun, 25 Sep 2011 02:21:40 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=QF7N3wz4Eq hWKiM4mToQEEVWetXIsOYmNOdDbDaR3QAA5NJVYcoYeI1scq51ePKu33p8Wj+TSO oLFsRuuKT9WZObojHxRf0ADlj3vkal8/Na8aP1zszKNaqE14afqNMaSw79lhe0h1 ZvLeSVgyMVKPfvYWh2b6IuI8nHNm0B1Kc= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=/6yAiF7ZXXUawUq8 eneYPUAv7cI=; b=GzAhkDnJDHvuVlsP26jCEcsSmPoMdp3KYRPkp9mN3EnhFSzA 4xJgqvldpu8LIOeYMimeC2/gVu55cXbpO2S+tEwhgtlNYo2CPbAVS+gvnJBnLRhD sH2pE06lp1dO4M9tsroxWn2PSYLgJKHBUj8oz74NVr2QLMRbUGDojpFic2s= Received: from [172.16.1.4] (219-89-1-137.dialup.xtra.co.nz [219.89.1.137]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a47.g.dreamhost.com (Postfix) with ESMTPSA id 2CEF4284058 for ; Sun, 25 Sep 2011 02:21:38 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1244.3) Content-Type: multipart/alternative; boundary="Apple-Mail=_8A8CB289-B5E2-45A3-8A78-BDC1DC6AC03D" Subject: Re: Moving to a new cluster Date: Sun, 25 Sep 2011 22:21:35 +1300 In-Reply-To: To: user@cassandra.apache.org References: <901040CC-C1B5-419D-9780-A2D0A00A0DBC@thelastpickle.com> Message-Id: <37F8975A-F8AE-4BF8-A442-03E160202AB3@thelastpickle.com> X-Mailer: Apple Mail (2.1244.3) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_8A8CB289-B5E2-45A3-8A78-BDC1DC6AC03D Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 sounds like it.=20 A ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 25/09/2011, at 6:10 PM, Yan Chunlu wrote: > thanks! is that similar problem described in this thread? >=20 > = http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/nodetool-= repair-caused-high-disk-space-usage-td6695542.html >=20 > On Sun, Sep 25, 2011 at 11:33 AM, aaron morton = wrote: > It can result in a lot of data on the node you run repair on. Where a = lot means perhaps 2 or more times more data. >=20 > My unscientific approach is to repair one CF at a time so you can = watch the disk usage and repair the smaller CF's first. After the repair = compact if you need to.=20 >=20 > I think the amount of extra data will be related to how out of sync = things are, so once you get repair working smoothly it will be less of = problem. >=20 > Cheers > =20 >=20 > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com >=20 > On 23/09/2011, at 3:04 AM, Yan Chunlu wrote: >=20 >>=20 >> hi Aaron: >>=20 >> could you explain more about the issue about repair make space usage = going crazy? >>=20 >> I am planning to upgrade my cluster from 0.7.4 to 0.8.6, which is = because the repair never works on 0.7.4 for me. >> more specifically, CASSANDRA-2280 and CASSANDRA-2156. >>=20 >>=20 >> from your description, I really worried about 0.8.6 might make it = worse... >>=20 >> thanks! >>=20 >> On Thu, Sep 22, 2011 at 7:25 AM, aaron morton = wrote: >> How much data is on the nodes in cluster 1 and how much disk space on = cluster 2 ? Be aware that Cassandra 0.8 has an issue where repair can go = crazy and use a lot of space.=20 >>=20 >> If you are not regularly running repair I would also repair before = the move. >>=20 >> The repair after the copy is a good idea but should technically not = be necessary. If you can practice the move watch the repair to see if = much is transferred (check the logs). There is always a small transfer, = but if you see data been transferred for several minutes I would = investigate.=20 >>=20 >> When you start a repair it will repair will the other nodes it = replicates data with. So you only need to run it every RF nodes. Start = it one one, watch the logs to see who it talks to and then start it on = the first node it does not talk to. And so on.=20 >>=20 >> Add a snapshot before the clean (repair will also snapshot before it = runs) >>=20 >> Scrub is not needed unless you are migrating or you have file errors. >>=20 >> If your cluster is online, consider running the clean every RFth node = rather than all at once (e.g. 1,4, 7, 10 then 2,5,8,11). It will have = less impact on clients.=20 >>=20 >> Cheers >>=20 >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> @aaronmorton >> http://www.thelastpickle.com >>=20 >> On 22/09/2011, at 10:27 AM, Philippe wrote: >>=20 >>> Hello, >>> We're currently running on a 3-node RF=3D3 cluster. Now that we have = a better grip on things, we want to replace it with a 12-node RF=3D3 = cluster of "smaller" servers. So I wonder what the best way to move the = data to the new cluster would be. I can afford to stop writing to the = current cluster for whatever time is necessary. Has anyone written up = something on this subject ? >>>=20 >>> My plan is the following (nodes in cluster 1 are node1.1->1.3, nodes = in cluster 2 are node2.1->2.12) >>> stop writing to current cluster & drain it >>> get a snapshot on each node >>> Since it's RF=3D3, each node should have all the data, so assuming I = set the tokens correctly I would move the snapshot from node1.1 to = node2.1, 2.2, 2.3 and 2.4 then node1.2->node2.5,2.6,2.,2.8, etc. This is = because the range for node1.1 is now spread across 2.1->2.4 >>> Run repair & clean & scrub on each node (more or less in //) >>> What do you think ? >>> Thanks >>=20 >>=20 >=20 >=20 --Apple-Mail=_8A8CB289-B5E2-45A3-8A78-BDC1DC6AC03D Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1
http://www.thelastpickle.com

On 25/09/2011, at 6:10 PM, Yan Chunlu wrote:

thanks! =  is that similar problem described in this = thread?

 http://cassand= ra-user-incubator-apache-org.3065146.n2.nabble.com/nodetool-repair-caused-= high-disk-space-usage-td6695542.html

On Sun, Sep 25, 2011 at 11:33 AM, aaron = morton <aaron@thelastpickle.com> wrote:
It can result in a lot of data on = the node you run repair on. Where a lot means perhaps 2 or more =  times more data.

My unscientific approach is to = repair one CF at a time so you can watch the disk usage and repair the = smaller CF's first. After the repair compact if you need to. 

I think  the amount of extra data will be = related to how out of sync things are, so once you get repair working = smoothly it will be less of = problem.

Cheers
    

-----------------
Aaron Morton
Freelance = Cassandra Developer
@aaronmorton

On 23/09/2011, at = 3:04 AM, Yan Chunlu wrote:


hi = Aaron:

could you explain more about the issue about = repair make space usage going crazy?

I am planning to upgrade my cluster from 0.7.4 to = 0.8.6, which is because the repair never works on 0.7.4 for me.
more specifically, CASSANDRA-2280 and CASSANDRA-2156.


from your description, I really = worried about 0.8.6 might make it = worse...

thanks!

On Thu, Sep 22, 2011 at 7:25 AM, aaron morton = <aaron@thelastpickle.com> wrote:
How much data is on the nodes in cluster = 1 and how much disk space on cluster 2 ? Be aware that Cassandra 0.8 has = an issue where repair can go crazy and use a lot of space. 

If you are not regularly running repair I would also = repair before the move.

The repair after the = copy is a good idea but should technically not be necessary. If you can = practice the move watch the repair to see if much is transferred (check = the logs). There is always a small transfer, but if you see data been = transferred for several minutes I would investigate. 

When you start a repair it will repair will the = other nodes it replicates data with. So you only need to run it every RF = nodes. Start it one one, watch the logs to see who it talks to and then = start it on the first node it does not talk to. And so on. 

Add a snapshot before the clean (repair will also = snapshot before it runs)

Scrub is not needed = unless you are migrating or you have file = errors.

If your cluster is online, consider = running the clean every RFth node rather than all at once (e.g. 1,4, 7, = 10 then 2,5,8,11). It will have less impact on clients. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra = Developer
@aaronmorton

On 22/09/2011, at 10:27 AM, Philippe = wrote:

Hello,
We're currently = running on a 3-node RF=3D3 cluster. Now that we have a better grip on = things, we want to replace it with a 12-node RF=3D3 cluster of "smaller" = servers. So I wonder what the best way to move the data to the new = cluster would be. I can afford to stop writing to the current cluster = for whatever time is necessary. Has anyone written up something on this = subject ?

My plan is the following (nodes in cluster 1 are node1.1->1.3, = nodes in cluster 2 are node2.1->2.12)
  • stop writing to = current cluster & drain it
  • get a snapshot on each = node
  • Since it's RF=3D3, each node should have all the data, so = assuming I set the tokens correctly I would move the snapshot from = node1.1 to node2.1, 2.2, 2.3 and 2.4 then = node1.2->node2.5,2.6,2.,2.8, etc. This is because the range for = node1.1 is now spread across 2.1->2.4
  • Run repair & clean & scrub on each node (more or less in = //)
What do you think ?
Thanks
=


=



= --Apple-Mail=_8A8CB289-B5E2-45A3-8A78-BDC1DC6AC03D--