helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kanak Biscuitwala <kana...@hotmail.com>
Subject RE: Error on participant while joining cluster
Date Tue, 26 Aug 2014 22:36:46 GMT
Hi Varun,

To answer your question on IRC, the resource's znode is deleted immediately on dropResource(),
but Helix will still be able to send dropped messages after this happens because there is
enough persisted information in the current state on each node.

Kanak

Date: Thu, 21 Aug 2014 12:56:21 -0700
Subject: Re: Error on participant while joining cluster
From: g.kishore@gmail.com
To: user@helix.apache.org

I dont see any issue at runtime. However, Helix as a support to backup the zookeeper nodes
on to a file system. I think | might cause problems while storing or restoring data onto zookeeper.
I would use something thats compatible with file system something like _ or probably -. 



On Thu, Aug 21, 2014 at 12:03 PM, Varun Sharma <varun@pinterest.com> wrote:

Is there any restriction with choosing resource names. I was initially putting "/" in the
name but that seems to be not working well since it ends up creating a znode with a slash.
I found that if i replace a "/" with a "|", a znode can be created. Could there be any other
issues inside helix with using a "|" in the resource name ?



Varun

On Tue, Aug 19, 2014 at 2:20 PM, Kanak Biscuitwala <kanak.b@hotmail.com> wrote:





But of course since HelixAdmin seems to be bugging out, what Jason said is right :)

From: kanak.b@hotmail.com

To: user@helix.apache.org
Subject: RE: Error on participant while joining cluster
Date: Tue, 19 Aug 2014 14:18:23 -0700




As Jason said, typically the naming convention is host_port, which helix tools automatically
parse as host and port. It is possible to use arbitrary instance IDs in theory though, so
it might be worth filing as a bug.



As for removing instances, the typical flow is to shut it down (so that the live instance
is gone), disable it, and then drop it using HelixAdmin.

From: zzhang@linkedin.com


To: user@helix.apache.org
Subject: Re: Error on participant while joining cluster
Date: Tue, 19 Aug 2014 21:05:46 +0000






First make sure under /<CLUSTER_NAME>/LIVEINSTANCES/, the node you want to remove from
the cluster is not running. Then you can simply remove the orphaned znodes under /<CLUTER_NAME>/INSTANCES
as well as under /<CLUSTER_NAME>/CONFIGS/PARTICIPANT. Normally
 ":" is not recommended in the instance id, and we internally replace it with "_". We will
check how to get rid of an instance with ":" in its id.



Thanks,
Jason







From: Varun Sharma <varun@pinterest.com>

Reply-To: "user@helix.apache.org" <user@helix.apache.org>



Date: Tuesday, August 19, 2014 1:58 PM

To: "user@helix.apache.org" <user@helix.apache.org>



Subject: Re: Error on participant while joining cluster







Can I simply remove the orphaned znodes under the /<CLUSTER_NAME>/INSTANCES tag ?



Varun





On Tue, Aug 19, 2014 at 1:54 PM, Varun Sharma 
<varun@pinterest.com> wrote:


Another issue I have now is that I ended up registering the participants as <host>:<port>
- this causes exceptions related to MBeann (because it does not like colon separators). I
dont know if that is interfering with normal controller operation.
 I restarted the instances replacing the : with a , but those old names are still stuck in
INSTANCES znode. How can I get rid of these - helix-admin seems to be replacing the ":" in
the node name with an underscore "_" and can't delete the node.



This is still causing MBean related exceptions in the log trace.




Varun







On Tue, Aug 19, 2014 at 12:18 PM, Zhen Zhang 
<zzhang@linkedin.com> wrote:



sure. Will add it.







From: kishore g <g.kishore@gmail.com>

Reply-To: "user@helix.apache.org" <user@helix.apache.org>



Date: Tuesday, August 19, 2014 12:14 PM

To: "user@helix.apache.org" <user@helix.apache.org>



Subject: Re: Error on participant while joining cluster









Thanks Jason. We need to add this to the documentation. I could not find the way to enable
auto-join from the docs. Should we add this to admin interface documentation?














On Tue, Aug 19, 2014 at 12:06 PM, Zhen Zhang 
<zzhang@linkedin.com> wrote:



Hi Varun, you need to either add the participant to the cluster before start it, or enable
participant auto-join config:



add participant to cluster:

./helix-admin.sh --zkSvr <ZookeeperServerAddress, e.g. localhost:2181> --addNode <clusterName,
e.g. terrapin> <instanceId, e.g. hdfsterrapin-a-datanode-531b2679_9090>




or, enable auto-join config:
./helix-admin.sh --zkSvr <ZookeeperServerAddress> --setConfig CLUSTER <clusterName>
allowParticipantAutoJoin=true



Thanks,
Jason










From: Varun Sharma <varun@pinterest.com>

Reply-To: "user@helix.apache.org" <user@helix.apache.org>



Date: Tuesday, August 19, 2014 11:47 AM

To: "user@helix.apache.org" <user@helix.apache.org>



Subject: Error on participant while joining cluster










I am getting the following error while trying to join a cluster as a participant. THe cluster
is setup and a controller has already connected to it. Can someone help out as to why this
is happening ?





2014-08-19 18:41:36,843 [main] (ZKHelixManager.java:727) INFO  Handling new session, session
id: 147a7beb2dd63f4, instance: hdfsterrapin-a-datanode-531b2679:9090, instanceTye: PARTICIPANT,
cluster: terrapin, zkconnection: State:CONNECTED Timeout:30000 sessionid:0x147a7beb2dd63f4
 local:/10.65.145.80:43854 remoteserver:terrapinzk001a/10.115.59.31:2181 lastZxid:0 xid:1
sent:1 recv:1 queuedpkts:0 pendingresp:0 queuedevents:0



2014-08-19 18:41:36,843 [main] (ParticipantHealthReportTask.java:67) WARN  ParticipantHealthReportTimerTask
already stopped

2014-08-19 18:41:36,914 [main] (ParticipantManagerHelper.java:101) INFO  instance: hdfsterrapin-a-datanode-531b2679:9090
auto-joining terrapin is false

2014-08-19 18:41:36,917 [main] (ZKUtil.java:95) INFO  Invalid instance setup, missing znode
path: /terrapin/CONFIGS/PARTICIPANT/hdfsterrapin-a-datanode-531b2679:9090

2014-08-19 18:41:36,918 [main] (ZKUtil.java:95) INFO  Invalid instance setup, missing znode
path: /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/MESSAGES

2014-08-19 18:41:36,918 [main] (ZKUtil.java:95) INFO  Invalid instance setup, missing znode
path: /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/CURRENTSTATES

2014-08-19 18:41:36,919 [main] (ZKUtil.java:95) INFO  Invalid instance setup, missing znode
path: /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/STATUSUPDATES

2014-08-19 18:41:36,920 [main] (ZKUtil.java:95) INFO  Invalid instance setup, missing znode
path: /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/ERRORS

2014-08-19 18:41:36,920 [main] (ZKHelixManager.java:496) ERROR fail to createClient.

org.apache.helix.HelixException: Initial cluster structure is not set up for instance: hdfsterrapin-a-datanode-531b2679:9090,
instanceType: PARTICIPANT

at org.apache.helix.manager.zk.ParticipantManagerHelper.joinCluster(ParticipantManagerHelper.java:108)

at org.apache.helix.manager.zk.ZKHelixManager.handleNewSessionAsParticipant(ZKHelixManager.java:869)

at org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:838)

at org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:493)

at org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:519)

at com.pinterest.terrapin.server.TerrapinServerHandler.start(TerrapinServerHandler.java:84)

at com.pinterest.terrapin.server.TerrapinServerMain.main(TerrapinServerMain.java:31)

2014-08-19 18:41:36,921 [main] (ZKHelixManager.java:522) ERROR fail to connect hdfsterrapin-a-datanode-531b2679:9090

org.apache.helix.HelixException: Initial cluster structure is not set up for instance: hdfsterrapin-a-datanode-531b2679:9090,
instanceType: PARTICIPANT

at org.apache.helix.manager.zk.ParticipantManagerHelper.joinCluster(ParticipantManagerHelper.java:108)

at org.apache.helix.manager.zk.ZKHelixManager.handleNewSessionAsParticipant(ZKHelixManager.java:869)

at org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:838)

at org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:493)

at org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:519)

at com.pinterest.terrapin.server.TerrapinServerHandler.start(TerrapinServerHandler.java:84)

at com.pinterest.terrapin.server.TerrapinServerMain.main(TerrapinServerMain.java:31)


































 		 	   		  
 		 	   		  



 		 	   		  
Mime
View raw message