helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kanak Biscuitwala <kana...@hotmail.com>
Subject RE: Error on participant while joining cluster
Date Tue, 19 Aug 2014 21:18:23 GMT
As Jason said, typically the naming convention is host_port, which helix tools automatically
parse as host and port. It is possible to use arbitrary instance IDs in theory though, so
it might be worth filing as a bug.

As for removing instances, the typical flow is to shut it down (so that the live instance
is gone), disable it, and then drop it using HelixAdmin.

From: zzhang@linkedin.com
To: user@helix.apache.org
Subject: Re: Error on participant while joining cluster
Date: Tue, 19 Aug 2014 21:05:46 +0000






First make sure under /<CLUSTER_NAME>/LIVEINSTANCES/, the node you want to remove from
the cluster is not running. Then you can simply remove the orphaned znodes under /<CLUTER_NAME>/INSTANCES
as well as under /<CLUSTER_NAME>/CONFIGS/PARTICIPANT. Normally
 ":" is not recommended in the instance id, and we internally replace it with "_". We will
check how to get rid of an instance with ":" in its id.



Thanks,
Jason





From: Varun Sharma <varun@pinterest.com>

Reply-To: "user@helix.apache.org" <user@helix.apache.org>

Date: Tuesday, August 19, 2014 1:58 PM

To: "user@helix.apache.org" <user@helix.apache.org>

Subject: Re: Error on participant while joining cluster







Can I simply remove the orphaned znodes under the /<CLUSTER_NAME>/INSTANCES tag ?



Varun





On Tue, Aug 19, 2014 at 1:54 PM, Varun Sharma 
<varun@pinterest.com> wrote:


Another issue I have now is that I ended up registering the participants as <host>:<port>
- this causes exceptions related to MBeann (because it does not like colon separators). I
dont know if that is interfering with normal controller operation.
 I restarted the instances replacing the : with a , but those old names are still stuck in
INSTANCES znode. How can I get rid of these - helix-admin seems to be replacing the ":" in
the node name with an underscore "_" and can't delete the node.



This is still causing MBean related exceptions in the log trace.




Varun







On Tue, Aug 19, 2014 at 12:18 PM, Zhen Zhang 
<zzhang@linkedin.com> wrote:



sure. Will add it.





From: kishore g <g.kishore@gmail.com>

Reply-To: "user@helix.apache.org" <user@helix.apache.org>

Date: Tuesday, August 19, 2014 12:14 PM

To: "user@helix.apache.org" <user@helix.apache.org>

Subject: Re: Error on participant while joining cluster









Thanks Jason. We need to add this to the documentation. I could not find the way to enable
auto-join from the docs. Should we add this to admin interface documentation?














On Tue, Aug 19, 2014 at 12:06 PM, Zhen Zhang 
<zzhang@linkedin.com> wrote:



Hi Varun, you need to either add the participant to the cluster before start it, or enable
participant auto-join config:



add participant to cluster:

./helix-admin.sh --zkSvr <ZookeeperServerAddress, e.g. localhost:2181> --addNode <clusterName,
e.g. terrapin> <instanceId, e.g. hdfsterrapin-a-datanode-531b2679_9090>




or, enable auto-join config:
./helix-admin.sh --zkSvr <ZookeeperServerAddress> --setConfig CLUSTER <clusterName>
allowParticipantAutoJoin=true



Thanks,
Jason








From: Varun Sharma <varun@pinterest.com>

Reply-To: "user@helix.apache.org" <user@helix.apache.org>

Date: Tuesday, August 19, 2014 11:47 AM

To: "user@helix.apache.org" <user@helix.apache.org>

Subject: Error on participant while joining cluster










I am getting the following error while trying to join a cluster as a participant. THe cluster
is setup and a controller has already connected to it. Can someone help out as to why this
is happening ?





2014-08-19 18:41:36,843 [main] (ZKHelixManager.java:727) INFO  Handling new session, session
id: 147a7beb2dd63f4, instance: hdfsterrapin-a-datanode-531b2679:9090, instanceTye: PARTICIPANT,
cluster: terrapin, zkconnection: State:CONNECTED Timeout:30000 sessionid:0x147a7beb2dd63f4
 local:/10.65.145.80:43854 remoteserver:terrapinzk001a/10.115.59.31:2181 lastZxid:0 xid:1
sent:1 recv:1 queuedpkts:0 pendingresp:0 queuedevents:0

2014-08-19 18:41:36,843 [main] (ParticipantHealthReportTask.java:67) WARN  ParticipantHealthReportTimerTask
already stopped

2014-08-19 18:41:36,914 [main] (ParticipantManagerHelper.java:101) INFO  instance: hdfsterrapin-a-datanode-531b2679:9090
auto-joining terrapin is false

2014-08-19 18:41:36,917 [main] (ZKUtil.java:95) INFO  Invalid instance setup, missing znode
path: /terrapin/CONFIGS/PARTICIPANT/hdfsterrapin-a-datanode-531b2679:9090

2014-08-19 18:41:36,918 [main] (ZKUtil.java:95) INFO  Invalid instance setup, missing znode
path: /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/MESSAGES

2014-08-19 18:41:36,918 [main] (ZKUtil.java:95) INFO  Invalid instance setup, missing znode
path: /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/CURRENTSTATES

2014-08-19 18:41:36,919 [main] (ZKUtil.java:95) INFO  Invalid instance setup, missing znode
path: /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/STATUSUPDATES

2014-08-19 18:41:36,920 [main] (ZKUtil.java:95) INFO  Invalid instance setup, missing znode
path: /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/ERRORS

2014-08-19 18:41:36,920 [main] (ZKHelixManager.java:496) ERROR fail to createClient.

org.apache.helix.HelixException: Initial cluster structure is not set up for instance: hdfsterrapin-a-datanode-531b2679:9090,
instanceType: PARTICIPANT

at org.apache.helix.manager.zk.ParticipantManagerHelper.joinCluster(ParticipantManagerHelper.java:108)

at org.apache.helix.manager.zk.ZKHelixManager.handleNewSessionAsParticipant(ZKHelixManager.java:869)

at org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:838)

at org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:493)

at org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:519)

at com.pinterest.terrapin.server.TerrapinServerHandler.start(TerrapinServerHandler.java:84)

at com.pinterest.terrapin.server.TerrapinServerMain.main(TerrapinServerMain.java:31)

2014-08-19 18:41:36,921 [main] (ZKHelixManager.java:522) ERROR fail to connect hdfsterrapin-a-datanode-531b2679:9090

org.apache.helix.HelixException: Initial cluster structure is not set up for instance: hdfsterrapin-a-datanode-531b2679:9090,
instanceType: PARTICIPANT

at org.apache.helix.manager.zk.ParticipantManagerHelper.joinCluster(ParticipantManagerHelper.java:108)

at org.apache.helix.manager.zk.ZKHelixManager.handleNewSessionAsParticipant(ZKHelixManager.java:869)

at org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:838)

at org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:493)

at org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:519)

at com.pinterest.terrapin.server.TerrapinServerHandler.start(TerrapinServerHandler.java:84)

at com.pinterest.terrapin.server.TerrapinServerMain.main(TerrapinServerMain.java:31)


































 		 	   		  
Mime
View raw message