Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 51EFC393D for ; Sun, 8 May 2011 22:40:56 +0000 (UTC) Received: (qmail 42317 invoked by uid 500); 8 May 2011 22:40:53 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 42293 invoked by uid 500); 8 May 2011 22:40:53 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 42285 invoked by uid 99); 8 May 2011 22:40:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 08 May 2011 22:40:53 +0000 X-ASF-Spam-Status: No, hits=2.9 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [66.33.216.122] (HELO hapkido.dreamhost.com) (66.33.216.122) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 08 May 2011 22:40:45 +0000 Received: from homiemail-a55.g.dreamhost.com (caibbdcaaaaf.dreamhost.com [208.113.200.5]) by hapkido.dreamhost.com (Postfix) with ESMTP id 6644117BF10 for ; Sun, 8 May 2011 15:40:22 -0700 (PDT) Received: from homiemail-a55.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a55.g.dreamhost.com (Postfix) with ESMTP id 57EA312C097 for ; Sun, 8 May 2011 15:40:15 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=rHJHXE0yfN 9u56PuZOeZ7STwUSDwPM6OQ3UQb1KtJwAnzzJEKEnHmTilA4klp0H8PU3gS9v7RU hPFGg3iUe9HsjU7eL204EwSoikrzS0434++AfUz4NZdpamiayNrRfO6UBnL4s+Rv TD8ahyEIQELac58sfD8Xdvu/2xkUDgaIU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=V5zTvfyEsV5iB5NM 5qrq9kO57cs=; b=gog4QgXZJ+ek4SnRkdlwS5xm5rno58JJP/9QdeSvbIFd6lbK jsUPQdY59aMm3CzR1x6Z09f458uYef6bN4NqzmvL/wmkRh3ITrUJ4lbNXT8GAkoO dIBhlcJjs/Wt7Exaefs4CNkPl2qv865ioMN+kM2ta0dXeTy28ynYrS49oBg= Received: from [10.0.1.151] (121-73-157-230.cable.telstraclear.net [121.73.157.230]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a55.g.dreamhost.com (Postfix) with ESMTPSA id 4637712C062 for ; Sun, 8 May 2011 15:40:14 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: multipart/alternative; boundary=Apple-Mail-32-169698820 Subject: Re: New node not joining Date: Mon, 9 May 2011 10:40:13 +1200 In-Reply-To: To: user@cassandra.apache.org References: Message-Id: X-Mailer: Apple Mail (2.1084) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-32-169698820 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 Ah, I see the case you are talking about.=20 If the node will auto bootstrap on startup if when it joins the ring: it = is not already bootstrapped, auto bootstrap is enabled, and the node is = not in it's own seed list. In the auto bootstrap process then finds the token it wants, but aborts = the process if there are no non system tables defined.That may happen = because the bootstrap code finds the node with the highest load and = splits it's range, if all the nodes have zero load (no user data) then = that process is unreliable. But it's also unreliable if there is a = schema and no data.=20 Created https://issues.apache.org/jira/browse/CASSANDRA-2625 to see if = it can be changed.=20 Thanks ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 7 May 2011, at 05:25, Len Bucchino wrote: > While I agree that what you suggested is a very good idea the = bootstrapping process _should_ work properly. > =20 > Here is some additional detail on the original problem. If the = current node that you are trying to bootstrap has itself listed in seeds = in its yaml then it will be able to bootstrap on an empty schema. If it = does not have itself listed in seeds in its yaml and you have and empty = schema then the bootstrap process will not complete and no errors will = be reported in the logs even with debug enabled. > =20 > From: aaron morton [mailto:aaron@thelastpickle.com]=20 > Sent: Thursday, May 05, 2011 6:51 PM > To: user@cassandra.apache.org > Subject: Re: New node not joining > =20 > When adding nodes it is a *very* good idea to manually set the tokens, = see http://wiki.apache.org/cassandra/Operations#Load_balancing > =20 > bootstrap is a process that happens only once on a node, where as well = as telling the other nodes it's around it asks them to stream over the = data it will no be responsible for.=20 > =20 > nodetool loadbalance is an old utility that should have better = warnings not to use it. The best way to load balance the cluster is = manually creating the tokens and assigning them either using the = initial_token config param or using nodetool move.=20 > =20 > Hope that helps.=20 > =20 > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > =20 > On 6 May 2011, at 08:37, Sanjeev Kulkarni wrote: >=20 >=20 > Here is what I did. > I booted up the first one. After that I started the second one with = bootstrap turned off. > Then I did a nodetool loadbalance on the second node.=20 > After which I added the third node again with bootstrap turned off. = Then did the loadbalance again on the third node. > This seems to have successfully completed and I am now able to = read/write into my system. > Thanks! >=20 > On Thu, May 5, 2011 at 1:22 PM, Len Bucchino = wrote: > I just rebuilt the cluster in the same manner as I did originally = except after I setup the first node I added a keyspace and column family = before adding any new nodes. This time the 3rd node auto bootstrapped = successfully. > =20 > From: Len Bucchino [mailto:Len.Bucchino@veritix.com]=20 > Sent: Thursday, May 05, 2011 1:31 PM >=20 > To: user@cassandra.apache.org > Subject: RE: New node not joining > =20 > =20 > Also, setting auto_bootstrap to false and setting token to the one = that it said it would use in the logs allows the new node to join the = ring. > =20 > From: Len Bucchino [mailto:Len.Bucchino@veritix.com]=20 > Sent: Thursday, May 05, 2011 1:25 PM > To: user@cassandra.apache.org > Subject: RE: New node not joining > =20 > Adding the fourth node to the cluster with an empty schema using = auto_bootstrap was not successful. A nodetool netstats on the new node = shows =93Mode: Joining: getting bootstrap token=94 similar to what the = third node did before it was manually added. Also, there are no = exceptions in the logs but it never joins the ring. > =20 > From: Sanjeev Kulkarni [mailto:sanjeev@locomatix.com]=20 > Sent: Thursday, May 05, 2011 11:47 AM > To: user@cassandra.apache.org > Subject: Re: New node not joining > =20 > Hi Len, > This looks like a decent workaround. I would be very interested to see = how the addition of the 4th node went. Please post it whenever you get a = chance. > Thanks! > =20 > On Thu, May 5, 2011 at 6:47 AM, Len Bucchino = wrote: > I have the same problem on 0.7.5 auto bootstrapping a 3rd node onto an = empty 2 node test cluster (the two nodes were manually added) and the it = currently has an empty schema. My log entries look similar to yours. I = took the new token it says its going to use from the log file added it = to the yaml and turned off auto bootstrap and the node added fine. I'm = bringing up a 4th node now and will see if it has the same problem auto = bootstrapping. > =20 > From: Sanjeev Kulkarni [sanjeev@locomatix.com] > Sent: Thursday, May 05, 2011 2:18 AM > To: user@cassandra.apache.org > Subject: New node not joining >=20 > Hey guys, > I'm running into what seems like a very basic problem. > I have a one node cassandra instance. Version 0.7.5. Freshly = installed. Contains no data. > The cassandra.yaml is the same as the default one that is supplied, = except for data/commitlog/saved_caches directories. > I also changed the addresses to point to a externally visible ip = address. > The cassandra comes up nicely and is ready to accept thrift = connections. > I do a nodetool and this is what I get. > =20 > 10.242.217.124 Up Normal 6.54 KB 100.00% = 110022862993086789903543147927259579701 > =20 > Which seems right to me. > =20 > Now I start another node. Almost identical configuration to the first = one. Except the bootstrap is turned true and seeds appropriately set. > When I start the second, I notice that the second one contacts the = first node to get the new token. > I see the following lines in the first machine(the seed machine). > =20 > INFO [GossipStage:1] 2011-05-05 07:00:20,427 Gossiper.java (line 628) = Node /10.83.111.80 has restarted,=20 > now UP again > INFO [HintedHandoff:1] 2011-05-05 07:00:55,162 = HintedHandOffManager.java (line 304) Started hinted handoff for endpoint = /10.83.111.80 > INFO [HintedHandoff:1] 2011-05-05 07:00:55,164 = HintedHandOffManager.java (line 360) Finished hinted hand > off of 0 rows to endpoint /10.83.111.80 > =20 > However when i do a node ring, I still get > =20 > 10.242.217.124 Up Normal 6.54 KB 100.00% = 110022862993086789903543147927259579701 > =20 > Even though the second node has come up. On the second machine the = logs say > =20 > INFO [main] 2011-05-05 07:00:19,124 StorageService.java (line 504) = Joining: getting load information > INFO [main] 2011-05-05 07:00:19,124 StorageLoadBalancer.java (line = 351) Sleeping 90000 ms to wait for load information... > INFO [GossipStage:1] 2011-05-05 07:00:20,828 Gossiper.java (line 628) = Node /10.242.217.124 has restarted, now UP again > INFO [HintedHandoff:1] 2011-05-05 07:00:29,548 = HintedHandOffManager.java (line 304) Started hinted handoff for endpoint = /10.242.217.124 > INFO [HintedHandoff:1] 2011-05-05 07:00:29,550 = HintedHandOffManager.java (line 360) Finished hinted handoff of 0 rows = to endpoint /10.242.217.124 > INFO [main] 2011-05-05 07:01:49,137 StorageService.java (line 504) = Joining: getting bootstrap token > INFO [main] 2011-05-05 07:01:49,148 BootStrapper.java (line 148) New = token will be 24952271262852174037699496069317526837 to assume load from = /10.242.217.124 > INFO [main] 2011-05-05 07:01:49,150 Mx4jTool.java (line 72) Will not = load MX4J, mx4j-tools.jar is not in the classpath > INFO [main] 2011-05-05 07:01:49,259 CassandraDaemon.java (line 112) = Binding thrift service to /10.83.111.80:9160 > INFO [main] 2011-05-05 07:01:49,262 CassandraDaemon.java (line 126) = Using TFastFramedTransport with a max frame size of 15728640 bytes. > INFO [Thread-5] 2011-05-05 07:01:49,266 CassandraDaemon.java (line = 154) Listening for thrift clients... > =20 > This seems to indicate that the second node has joined the ring. And = has gotten its key range.=20 > Am I missing anything? >=20 > Thanks! > =20 > =20 > =20 > =20 --Apple-Mail-32-169698820 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252
Ah, I see the case you are talking = about. 

If the node will auto bootstrap on = startup if when it joins the ring: it is not already bootstrapped, auto = bootstrap is enabled, and the node is not in it's own seed = list.

In the auto bootstrap process then finds = the token it wants, but aborts the process if there are no non system = tables defined.That may happen because the bootstrap code finds the node = with the highest load and splits it's range, if all the nodes have zero = load (no user data) then that process is unreliable. But it's also = unreliable if there is a schema and no = data. 

Created https://issu= es.apache.org/jira/browse/CASSANDRA-2625 to see if it can be = changed. 

Thanks

http://www.thelastpickle.com

On 7 May 2011, at 05:25, Len Bucchino wrote:

While = I agree that what you suggested is a very good idea the bootstrapping = process _should_ work properly.
Here = is some additional detail on the original problem.  If the current = node that you are trying to bootstrap has itself listed in seeds in its = yaml then it will be able to bootstrap on an empty schema.  If it = does not have itself listed in seeds in its yaml and you have and empty = schema then the bootstrap process will not complete and no errors will = be reported in the logs even with debug = enabled.
From: aaron morton = [mailto:aaron@thelastpickle.com] 
Sent: Thursday, May 05, 2011 6:51 = PM
To:  
Re: New node not = joining

Thanks!


=
Which seems right to = me.
Now I start another node. Almost identical = configuration to the first one. Except the bootstrap is turned true and = seeds appropriately set.
When I start the second, I = notice that the second one contacts the first node to get the new = token.
I see the following lines in the first machine(the seed = machine).
now UP = again
 INFO [HintedHandoff:1] 2011-05-05 07:00:55,162 = HintedHandOffManager.java (line 304) Started hinted handoff for endpoint = / INFO [HintedHandoff:1] = 2011-05-05 07:00:55,164 HintedHandOffManager.java (line 360) Finished = hinted hand