helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kanak Biscuitwala <kana...@hotmail.com>
Subject RE: Error on participant while joining cluster
Date Wed, 27 Aug 2014 01:32:36 GMT
It's the former, and yes, you should use a RoutingTableProvider. You don't need to establish
a new connection to do this; your existing HelixManager is capable.

Date: Tue, 26 Aug 2014 18:22:25 -0700
Subject: Re: Error on participant while joining cluster
From: varun@pinterest.com
To: user@helix.apache.org

Another quick question - if I open the externalview from inside a contoller using helixadmin.getResourceExternalView
- is that a zk call or is the external view cached in local memory. If the former, is it better
to establish a spectator conn. so we get notified of changes instead of having to pull every
time (I am polling external view for all resources every few minutes which is why i am asking
this question)..


On Tue, Aug 26, 2014 at 5:02 PM, kishore g <g.kishore@gmail.com> wrote:

I think they are thread safe because ZKHelixAdmin is stateless.I think the right question
is "are the operations atomic". Most HelixAdmin operation change znodes in zookeeper. By default
none of the operations are atomic. However, HelixAdmin under the hood uses HelixDataAccessor
that supports atomic operations.



If you really want these operations to be atomic, you can use HelixDataAccessor and BaseDataAccessor.
These are low level api's and if you really need atomicity, we should probably file a jira
and provide the high level apis in HelixAdmin.








On Tue, Aug 26, 2014 at 4:48 PM, Varun Sharma <varun@pinterest.com> wrote:


I am doing an "addResource", "dropResource" in separate threads. Its highly highly unlikely
for me to call these operations on the same resource concurrently.


Varun


On Tue, Aug 26, 2014 at 4:45 PM, Kanak Biscuitwala <kanak.b@hotmail.com> wrote:






I would have to say, "it depends." There are operations that are idempotent (e.g. dropResource),
atomic (e.g. setResourceIdealState), both, or neither (e.g. resetResource). Generally speaking,
you should be OK for most operations, but there isn't any synchronization, so depending on
which ZNodes are affected and how, there may be some thread safety issues.



Are there specific operations you need to be thread-safe?


Date: Tue, 26 Aug 2014 16:37:50 -0700
Subject: Re: Error on participant while joining cluster
From: varun@pinterest.com



To: user@helix.apache.org

Thanks Kanak. Another question, is HelixAdmin thread safe ?
Varun


On Tue, Aug 26, 2014 at 3:36 PM, Kanak Biscuitwala <kanak.b@hotmail.com> wrote:




Hi Varun,

To answer your question on IRC, the resource's znode is deleted immediately on dropResource(),
but Helix will still be able to send dropped messages after this happens because there is
enough persisted information in the current state on each node.





Kanak

Date: Thu, 21 Aug 2014 12:56:21 -0700
Subject: Re: Error on participant while joining cluster
From: g.kishore@gmail.com




To: user@helix.apache.org

I dont see any issue at runtime. However, Helix as a support to backup the zookeeper nodes
on to a file system. I think | might cause problems while storing or restoring data onto zookeeper.
I would use something thats compatible with file system something like _ or probably -. 







On Thu, Aug 21, 2014 at 12:03 PM, Varun Sharma <varun@pinterest.com> wrote:

Is there any restriction with choosing resource names. I was initially putting "/" in the
name but that seems to be not working well since it ends up creating a znode with a slash.
I found that if i replace a "/" with a "|", a znode can be created. Could there be any other
issues inside helix with using a "|" in the resource name ?







Varun

On Tue, Aug 19, 2014 at 2:20 PM, Kanak Biscuitwala <kanak.b@hotmail.com> wrote:









But of course since HelixAdmin seems to be bugging out, what Jason said is right :)

From: kanak.b@hotmail.com

To: user@helix.apache.org
Subject: RE: Error on participant while joining cluster
Date: Tue, 19 Aug 2014 14:18:23 -0700




As Jason said, typically the naming convention is host_port, which helix tools automatically
parse as host and port. It is possible to use arbitrary instance IDs in theory though, so
it might be worth filing as a bug.







As for removing instances, the typical flow is to shut it down (so that the live instance
is gone), disable it, and then drop it using HelixAdmin.

From: zzhang@linkedin.com






To: user@helix.apache.org
Subject: Re: Error on participant while joining cluster
Date: Tue, 19 Aug 2014 21:05:46 +0000






First make sure under /<CLUSTER_NAME>/LIVEINSTANCES/, the node you want to remove from
the cluster is not running. Then you can simply remove the orphaned znodes under /<CLUTER_NAME>/INSTANCES
as well as under /<CLUSTER_NAME>/CONFIGS/PARTICIPANT. Normally
 ":" is not recommended in the instance id, and we internally replace it with "_". We will
check how to get rid of an instance with ":" in its id.



Thanks,
Jason











From: Varun Sharma <varun@pinterest.com>

Reply-To: "user@helix.apache.org" <user@helix.apache.org>







Date: Tuesday, August 19, 2014 1:58 PM

To: "user@helix.apache.org" <user@helix.apache.org>







Subject: Re: Error on participant while joining cluster







Can I simply remove the orphaned znodes under the /<CLUSTER_NAME>/INSTANCES tag ?



Varun





On Tue, Aug 19, 2014 at 1:54 PM, Varun Sharma 
<varun@pinterest.com> wrote:


Another issue I have now is that I ended up registering the participants as <host>:<port>
- this causes exceptions related to MBeann (because it does not like colon separators). I
dont know if that is interfering with normal controller operation.
 I restarted the instances replacing the : with a , but those old names are still stuck in
INSTANCES znode. How can I get rid of these - helix-admin seems to be replacing the ":" in
the node name with an underscore "_" and can't delete the node.



This is still causing MBean related exceptions in the log trace.




Varun







On Tue, Aug 19, 2014 at 12:18 PM, Zhen Zhang 
<zzhang@linkedin.com> wrote:



sure. Will add it.











From: kishore g <g.kishore@gmail.com>

Reply-To: "user@helix.apache.org" <user@helix.apache.org>







Date: Tuesday, August 19, 2014 12:14 PM

To: "user@helix.apache.org" <user@helix.apache.org>







Subject: Re: Error on participant while joining cluster









Thanks Jason. We need to add this to the documentation. I could not find the way to enable
auto-join from the docs. Should we add this to admin interface documentation?














On Tue, Aug 19, 2014 at 12:06 PM, Zhen Zhang 
<zzhang@linkedin.com> wrote:



Hi Varun, you need to either add the participant to the cluster before start it, or enable
participant auto-join config:



add participant to cluster:

./helix-admin.sh --zkSvr <ZookeeperServerAddress, e.g. localhost:2181> --addNode <clusterName,
e.g. terrapin> <instanceId, e.g. hdfsterrapin-a-datanode-531b2679_9090>




or, enable auto-join config:
./helix-admin.sh --zkSvr <ZookeeperServerAddress> --setConfig CLUSTER <clusterName>
allowParticipantAutoJoin=true



Thanks,
Jason














From: Varun Sharma <varun@pinterest.com>

Reply-To: "user@helix.apache.org" <user@helix.apache.org>







Date: Tuesday, August 19, 2014 11:47 AM

To: "user@helix.apache.org" <user@helix.apache.org>







Subject: Error on participant while joining cluster










I am getting the following error while trying to join a cluster as a participant. THe cluster
is setup and a controller has already connected to it. Can someone help out as to why this
is happening ?





2014-08-19 18:41:36,843 [main] (ZKHelixManager.java:727) INFO  Handling new session, session
id: 147a7beb2dd63f4, instance: hdfsterrapin-a-datanode-531b2679:9090, instanceTye: PARTICIPANT,
cluster: terrapin, zkconnection: State:CONNECTED Timeout:30000 sessionid:0x147a7beb2dd63f4
 local:/10.65.145.80:43854 remoteserver:terrapinzk001a/10.115.59.31:2181 lastZxid:0 xid:1
sent:1 recv:1 queuedpkts:0 pendingresp:0 queuedevents:0







2014-08-19 18:41:36,843 [main] (ParticipantHealthReportTask.java:67) WARN  ParticipantHealthReportTimerTask
already stopped

2014-08-19 18:41:36,914 [main] (ParticipantManagerHelper.java:101) INFO  instance: hdfsterrapin-a-datanode-531b2679:9090
auto-joining terrapin is false

2014-08-19 18:41:36,917 [main] (ZKUtil.java:95) INFO  Invalid instance setup, missing znode
path: /terrapin/CONFIGS/PARTICIPANT/hdfsterrapin-a-datanode-531b2679:9090

2014-08-19 18:41:36,918 [main] (ZKUtil.java:95) INFO  Invalid instance setup, missing znode
path: /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/MESSAGES

2014-08-19 18:41:36,918 [main] (ZKUtil.java:95) INFO  Invalid instance setup, missing znode
path: /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/CURRENTSTATES

2014-08-19 18:41:36,919 [main] (ZKUtil.java:95) INFO  Invalid instance setup, missing znode
path: /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/STATUSUPDATES

2014-08-19 18:41:36,920 [main] (ZKUtil.java:95) INFO  Invalid instance setup, missing znode
path: /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/ERRORS

2014-08-19 18:41:36,920 [main] (ZKHelixManager.java:496) ERROR fail to createClient.

org.apache.helix.HelixException: Initial cluster structure is not set up for instance: hdfsterrapin-a-datanode-531b2679:9090,
instanceType: PARTICIPANT

at org.apache.helix.manager.zk.ParticipantManagerHelper.joinCluster(ParticipantManagerHelper.java:108)

at org.apache.helix.manager.zk.ZKHelixManager.handleNewSessionAsParticipant(ZKHelixManager.java:869)

at org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:838)

at org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:493)

at org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:519)

at com.pinterest.terrapin.server.TerrapinServerHandler.start(TerrapinServerHandler.java:84)

at com.pinterest.terrapin.server.TerrapinServerMain.main(TerrapinServerMain.java:31)

2014-08-19 18:41:36,921 [main] (ZKHelixManager.java:522) ERROR fail to connect hdfsterrapin-a-datanode-531b2679:9090

org.apache.helix.HelixException: Initial cluster structure is not set up for instance: hdfsterrapin-a-datanode-531b2679:9090,
instanceType: PARTICIPANT

at org.apache.helix.manager.zk.ParticipantManagerHelper.joinCluster(ParticipantManagerHelper.java:108)

at org.apache.helix.manager.zk.ZKHelixManager.handleNewSessionAsParticipant(ZKHelixManager.java:869)

at org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:838)

at org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:493)

at org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:519)

at com.pinterest.terrapin.server.TerrapinServerHandler.start(TerrapinServerHandler.java:84)

at com.pinterest.terrapin.server.TerrapinServerMain.main(TerrapinServerMain.java:31)


































 		 	   		  
 		 	   		  



 		 	   		  

 		 	   		  





 		 	   		  
Mime
View raw message