helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject Re: Error on participant while joining cluster
Date Thu, 21 Aug 2014 19:56:21 GMT
I dont see any issue at runtime. However, Helix as a support to backup the
zookeeper nodes on to a file system. I think | might cause problems while
storing or restoring data onto zookeeper. I would use something thats
compatible with file system something like _ or probably -.


On Thu, Aug 21, 2014 at 12:03 PM, Varun Sharma <varun@pinterest.com> wrote:

> Is there any restriction with choosing resource names. I was initially
> putting "/" in the name but that seems to be not working well since it ends
> up creating a znode with a slash. I found that if i replace a "/" with a
> "|", a znode can be created. Could there be any other issues inside helix
> with using a "|" in the resource name ?
>
> Varun
>
>
> On Tue, Aug 19, 2014 at 2:20 PM, Kanak Biscuitwala <kanak.b@hotmail.com>
> wrote:
>
>> But of course since HelixAdmin seems to be bugging out, what Jason said
>> is right :)
>>
>> ------------------------------
>> From: kanak.b@hotmail.com
>> To: user@helix.apache.org
>> Subject: RE: Error on participant while joining cluster
>> Date: Tue, 19 Aug 2014 14:18:23 -0700
>>
>>
>> As Jason said, typically the naming convention is host_port, which helix
>> tools automatically parse as host and port. It is possible to use arbitrary
>> instance IDs in theory though, so it might be worth filing as a bug.
>>
>> As for removing instances, the typical flow is to shut it down (so that
>> the live instance is gone), disable it, and then drop it using HelixAdmin.
>>
>> ------------------------------
>> From: zzhang@linkedin.com
>> To: user@helix.apache.org
>> Subject: Re: Error on participant while joining cluster
>> Date: Tue, 19 Aug 2014 21:05:46 +0000
>>
>> First make sure under /<CLUSTER_NAME>/LIVEINSTANCES/, the node you want
>> to remove from the cluster is not running. Then you can simply remove the
>> orphaned znodes under /<CLUTER_NAME>/INSTANCES as well as under
>> /<CLUSTER_NAME>/CONFIGS/PARTICIPANT. Normally ":" is not recommended in the
>> instance id, and we internally replace it with "_". We will check how to
>> get rid of an instance with ":" in its id.
>>
>>  Thanks,
>> Jason
>>
>>   From: Varun Sharma <varun@pinterest.com>
>> Reply-To: "user@helix.apache.org" <user@helix.apache.org>
>> Date: Tuesday, August 19, 2014 1:58 PM
>> To: "user@helix.apache.org" <user@helix.apache.org>
>> Subject: Re: Error on participant while joining cluster
>>
>>   Can I simply remove the orphaned znodes under the
>> /<CLUSTER_NAME>/INSTANCES tag ?
>>
>>  Varun
>>
>>
>> On Tue, Aug 19, 2014 at 1:54 PM, Varun Sharma <varun@pinterest.com>
>> wrote:
>>
>> Another issue I have now is that I ended up registering the participants
>> as <host>:<port> - this causes exceptions related to MBeann (because
it
>> does not like colon separators). I dont know if that is interfering with
>> normal controller operation. I restarted the instances replacing the : with
>> a , but those old names are still stuck in INSTANCES znode. How can I get
>> rid of these - helix-admin seems to be replacing the ":" in the node name
>> with an underscore "_" and can't delete the node.
>>
>>  This is still causing MBean related exceptions in the log trace.
>>
>>  Varun
>>
>>
>> On Tue, Aug 19, 2014 at 12:18 PM, Zhen Zhang <zzhang@linkedin.com> wrote:
>>
>>  sure. Will add it.
>>
>>   From: kishore g <g.kishore@gmail.com>
>> Reply-To: "user@helix.apache.org" <user@helix.apache.org>
>> Date: Tuesday, August 19, 2014 12:14 PM
>> To: "user@helix.apache.org" <user@helix.apache.org>
>> Subject: Re: Error on participant while joining cluster
>>
>>   Thanks Jason. We need to add this to the documentation. I could not
>> find the way to enable auto-join from the docs. Should we add this to admin
>> interface documentation?
>>
>>
>>
>>
>>
>>
>> On Tue, Aug 19, 2014 at 12:06 PM, Zhen Zhang <zzhang@linkedin.com> wrote:
>>
>>  Hi Varun, you need to either add the participant to the cluster before
>> start it, or enable participant auto-join config:
>>
>>  add participant to cluster:
>>  ./helix-admin.sh --zkSvr <ZookeeperServerAddress, e.g. localhost:2181>
>> --addNode <clusterName, e.g. terrapin> <instanceId, e.g.
>> hdfsterrapin-a-datanode-531b2679_9090>
>>
>>  or, enable auto-join config:
>> ./helix-admin.sh --zkSvr <ZookeeperServerAddress> --setConfig CLUSTER
>> <clusterName> allowParticipantAutoJoin=true
>>
>>  Thanks,
>> Jason
>>
>>
>>   From: Varun Sharma <varun@pinterest.com>
>> Reply-To: "user@helix.apache.org" <user@helix.apache.org>
>> Date: Tuesday, August 19, 2014 11:47 AM
>> To: "user@helix.apache.org" <user@helix.apache.org>
>> Subject: Error on participant while joining cluster
>>
>>   I am getting the following error while trying to join a cluster as a
>> participant. THe cluster is setup and a controller has already connected to
>> it. Can someone help out as to why this is happening ?
>>
>>
>> 2014-08-19 18:41:36,843 [main] (ZKHelixManager.java:727) INFO  Handling
>> new session, session id: 147a7beb2dd63f4, instance:
>> hdfsterrapin-a-datanode-531b2679:9090, instanceTye: PARTICIPANT, cluster:
>> terrapin, zkconnection: State:CONNECTED Timeout:30000
>> sessionid:0x147a7beb2dd63f4 local:/10.65.145.80:43854
>> remoteserver:terrapinzk001a/10.115.59.31:2181 lastZxid:0 xid:1 sent:1
>> recv:1 queuedpkts:0 pendingresp:0 queuedevents:0
>> 2014-08-19 18:41:36,843 [main] (ParticipantHealthReportTask.java:67)
>> WARN  ParticipantHealthReportTimerTask already stopped
>> 2014-08-19 18:41:36,914 [main] (ParticipantManagerHelper.java:101) INFO
>> instance: hdfsterrapin-a-datanode-531b2679:9090 auto-joining terrapin is
>> false
>> *2014-08-19 18:41:36,917 [main] (ZKUtil.java:95) INFO  Invalid instance
>> setup, missing znode path:
>> /terrapin/CONFIGS/PARTICIPANT/hdfsterrapin-a-datanode-531b2679:9090*
>> *2014-08-19 18:41:36,918 [main] (ZKUtil.java:95) INFO  Invalid instance
>> setup, missing znode path:
>> /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/MESSAGES*
>> *2014-08-19 18:41:36,918 [main] (ZKUtil.java:95) INFO  Invalid instance
>> setup, missing znode path:
>> /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/CURRENTSTATES*
>> *2014-08-19 18:41:36,919 [main] (ZKUtil.java:95) INFO  Invalid instance
>> setup, missing znode path:
>> /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/STATUSUPDATES*
>> *2014-08-19 18:41:36,920 [main] (ZKUtil.java:95) INFO  Invalid instance
>> setup, missing znode path:
>> /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/ERRORS*
>> *2014-08-19 18:41:36,920 [main] (ZKHelixManager.java:496) ERROR fail to
>> createClient.*
>> *org.apache.helix.HelixException: Initial cluster structure is not set up
>> for instance: hdfsterrapin-a-datanode-531b2679:9090, instanceType:
>> PARTICIPANT*
>> at
>> org.apache.helix.manager.zk.ParticipantManagerHelper.joinCluster(ParticipantManagerHelper.java:108)
>> at
>> org.apache.helix.manager.zk.ZKHelixManager.handleNewSessionAsParticipant(ZKHelixManager.java:869)
>> at
>> org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:838)
>> at
>> org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:493)
>> at
>> org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:519)
>> at
>> com.pinterest.terrapin.server.TerrapinServerHandler.start(TerrapinServerHandler.java:84)
>> at
>> com.pinterest.terrapin.server.TerrapinServerMain.main(TerrapinServerMain.java:31)
>> 2014-08-19 18:41:36,921 [main] (ZKHelixManager.java:522) ERROR fail to
>> connect hdfsterrapin-a-datanode-531b2679:9090
>> org.apache.helix.HelixException: Initial cluster structure is not set up
>> for instance: hdfsterrapin-a-datanode-531b2679:9090, instanceType:
>> PARTICIPANT
>> at
>> org.apache.helix.manager.zk.ParticipantManagerHelper.joinCluster(ParticipantManagerHelper.java:108)
>> at
>> org.apache.helix.manager.zk.ZKHelixManager.handleNewSessionAsParticipant(ZKHelixManager.java:869)
>> at
>> org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:838)
>> at
>> org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:493)
>> at
>> org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:519)
>> at
>> com.pinterest.terrapin.server.TerrapinServerHandler.start(TerrapinServerHandler.java:84)
>> at
>> com.pinterest.terrapin.server.TerrapinServerMain.main(TerrapinServerMain.java:31)
>>
>>
>>
>>
>>
>

Mime
View raw message