helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject Re: Error on participant while joining cluster
Date Wed, 27 Aug 2014 00:02:05 GMT
I think they are thread safe because ZKHelixAdmin is stateless.I think the
right question is "are the operations atomic". Most HelixAdmin operation
change znodes in zookeeper. By default none of the operations are atomic.
However, HelixAdmin under the hood uses HelixDataAccessor that supports
atomic operations.

If you really want these operations to be atomic, you can use
HelixDataAccessor and BaseDataAccessor. These are low level api's and if
you really need atomicity, we should probably file a jira and provide the
high level apis in HelixAdmin.






On Tue, Aug 26, 2014 at 4:48 PM, Varun Sharma <varun@pinterest.com> wrote:

> I am doing an "addResource", "dropResource" in separate threads. Its
> highly highly unlikely for me to call these operations on the same resource
> concurrently.
>
> Varun
>
>
> On Tue, Aug 26, 2014 at 4:45 PM, Kanak Biscuitwala <kanak.b@hotmail.com>
> wrote:
>
>> I would have to say, "it depends." There are operations that are
>> idempotent (e.g. dropResource), atomic (e.g. setResourceIdealState), both,
>> or neither (e.g. resetResource). Generally speaking, you should be OK for
>> most operations, but there isn't any synchronization, so depending on which
>> ZNodes are affected and how, there may be some thread safety issues.
>>
>> Are there specific operations you need to be thread-safe?
>>
>>
>> ------------------------------
>> Date: Tue, 26 Aug 2014 16:37:50 -0700
>>
>> Subject: Re: Error on participant while joining cluster
>> From: varun@pinterest.com
>> To: user@helix.apache.org
>>
>>
>> Thanks Kanak. Another question, is HelixAdmin thread safe ?
>>
>> Varun
>>
>>
>> On Tue, Aug 26, 2014 at 3:36 PM, Kanak Biscuitwala <kanak.b@hotmail.com>
>> wrote:
>>
>> Hi Varun,
>>
>>
>> To answer your question on IRC, the resource's znode is deleted
>> immediately on dropResource(), but Helix will still be able to send dropped
>> messages after this happens because there is enough persisted information
>> in the current state on each node.
>>
>>
>> Kanak
>>
>> ------------------------------
>> Date: Thu, 21 Aug 2014 12:56:21 -0700
>>
>> Subject: Re: Error on participant while joining cluster
>> From: g.kishore@gmail.com
>> To: user@helix.apache.org
>>
>>
>> I dont see any issue at runtime. However, Helix as a support to backup
>> the zookeeper nodes on to a file system. I think | might cause problems
>> while storing or restoring data onto zookeeper. I would use something thats
>> compatible with file system something like _ or probably -.
>>
>>
>> On Thu, Aug 21, 2014 at 12:03 PM, Varun Sharma <varun@pinterest.com>
>> wrote:
>>
>> Is there any restriction with choosing resource names. I was initially
>> putting "/" in the name but that seems to be not working well since it ends
>> up creating a znode with a slash. I found that if i replace a "/" with a
>> "|", a znode can be created. Could there be any other issues inside helix
>> with using a "|" in the resource name ?
>>
>> Varun
>>
>>
>> On Tue, Aug 19, 2014 at 2:20 PM, Kanak Biscuitwala <kanak.b@hotmail.com>
>> wrote:
>>
>> But of course since HelixAdmin seems to be bugging out, what Jason said
>> is right :)
>>
>> ------------------------------
>> From: kanak.b@hotmail.com
>> To: user@helix.apache.org
>> Subject: RE: Error on participant while joining cluster
>> Date: Tue, 19 Aug 2014 14:18:23 -0700
>>
>>
>> As Jason said, typically the naming convention is host_port, which helix
>> tools automatically parse as host and port. It is possible to use arbitrary
>> instance IDs in theory though, so it might be worth filing as a bug.
>>
>> As for removing instances, the typical flow is to shut it down (so that
>> the live instance is gone), disable it, and then drop it using HelixAdmin.
>>
>> ------------------------------
>> From: zzhang@linkedin.com
>> To: user@helix.apache.org
>> Subject: Re: Error on participant while joining cluster
>> Date: Tue, 19 Aug 2014 21:05:46 +0000
>>
>> First make sure under /<CLUSTER_NAME>/LIVEINSTANCES/, the node you want
>> to remove from the cluster is not running. Then you can simply remove the
>> orphaned znodes under /<CLUTER_NAME>/INSTANCES as well as under
>> /<CLUSTER_NAME>/CONFIGS/PARTICIPANT. Normally ":" is not recommended in the
>> instance id, and we internally replace it with "_". We will check how to
>> get rid of an instance with ":" in its id.
>>
>>  Thanks,
>> Jason
>>
>>   From: Varun Sharma <varun@pinterest.com>
>> Reply-To: "user@helix.apache.org" <user@helix.apache.org>
>> Date: Tuesday, August 19, 2014 1:58 PM
>> To: "user@helix.apache.org" <user@helix.apache.org>
>> Subject: Re: Error on participant while joining cluster
>>
>>   Can I simply remove the orphaned znodes under the
>> /<CLUSTER_NAME>/INSTANCES tag ?
>>
>>  Varun
>>
>>
>> On Tue, Aug 19, 2014 at 1:54 PM, Varun Sharma <varun@pinterest.com>
>> wrote:
>>
>> Another issue I have now is that I ended up registering the participants
>> as <host>:<port> - this causes exceptions related to MBeann (because
it
>> does not like colon separators). I dont know if that is interfering with
>> normal controller operation. I restarted the instances replacing the : with
>> a , but those old names are still stuck in INSTANCES znode. How can I get
>> rid of these - helix-admin seems to be replacing the ":" in the node name
>> with an underscore "_" and can't delete the node.
>>
>>  This is still causing MBean related exceptions in the log trace.
>>
>>  Varun
>>
>>
>> On Tue, Aug 19, 2014 at 12:18 PM, Zhen Zhang <zzhang@linkedin.com> wrote:
>>
>>  sure. Will add it.
>>
>>   From: kishore g <g.kishore@gmail.com>
>> Reply-To: "user@helix.apache.org" <user@helix.apache.org>
>> Date: Tuesday, August 19, 2014 12:14 PM
>> To: "user@helix.apache.org" <user@helix.apache.org>
>> Subject: Re: Error on participant while joining cluster
>>
>>   Thanks Jason. We need to add this to the documentation. I could not
>> find the way to enable auto-join from the docs. Should we add this to admin
>> interface documentation?
>>
>>
>>
>>
>>
>>
>> On Tue, Aug 19, 2014 at 12:06 PM, Zhen Zhang <zzhang@linkedin.com> wrote:
>>
>>  Hi Varun, you need to either add the participant to the cluster before
>> start it, or enable participant auto-join config:
>>
>>  add participant to cluster:
>>  ./helix-admin.sh --zkSvr <ZookeeperServerAddress, e.g. localhost:2181>
>> --addNode <clusterName, e.g. terrapin> <instanceId, e.g.
>> hdfsterrapin-a-datanode-531b2679_9090>
>>
>>  or, enable auto-join config:
>> ./helix-admin.sh --zkSvr <ZookeeperServerAddress> --setConfig CLUSTER
>> <clusterName> allowParticipantAutoJoin=true
>>
>>  Thanks,
>> Jason
>>
>>
>>   From: Varun Sharma <varun@pinterest.com>
>> Reply-To: "user@helix.apache.org" <user@helix.apache.org>
>> Date: Tuesday, August 19, 2014 11:47 AM
>> To: "user@helix.apache.org" <user@helix.apache.org>
>> Subject: Error on participant while joining cluster
>>
>>   I am getting the following error while trying to join a cluster as a
>> participant. THe cluster is setup and a controller has already connected to
>> it. Can someone help out as to why this is happening ?
>>
>>
>> 2014-08-19 18:41:36,843 [main] (ZKHelixManager.java:727) INFO  Handling
>> new session, session id: 147a7beb2dd63f4, instance:
>> hdfsterrapin-a-datanode-531b2679:9090, instanceTye: PARTICIPANT, cluster:
>> terrapin, zkconnection: State:CONNECTED Timeout:30000
>> sessionid:0x147a7beb2dd63f4 local:/10.65.145.80:43854
>> remoteserver:terrapinzk001a/10.115.59.31:2181 lastZxid:0 xid:1 sent:1
>> recv:1 queuedpkts:0 pendingresp:0 queuedevents:0
>> 2014-08-19 18:41:36,843 [main] (ParticipantHealthReportTask.java:67)
>> WARN  ParticipantHealthReportTimerTask already stopped
>> 2014-08-19 18:41:36,914 [main] (ParticipantManagerHelper.java:101) INFO
>> instance: hdfsterrapin-a-datanode-531b2679:9090 auto-joining terrapin is
>> false
>> *2014-08-19 18:41:36,917 [main] (ZKUtil.java:95) INFO  Invalid instance
>> setup, missing znode path:
>> /terrapin/CONFIGS/PARTICIPANT/hdfsterrapin-a-datanode-531b2679:9090*
>> *2014-08-19 18:41:36,918 [main] (ZKUtil.java:95) INFO  Invalid instance
>> setup, missing znode path:
>> /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/MESSAGES*
>> *2014-08-19 18:41:36,918 [main] (ZKUtil.java:95) INFO  Invalid instance
>> setup, missing znode path:
>> /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/CURRENTSTATES*
>> *2014-08-19 18:41:36,919 [main] (ZKUtil.java:95) INFO  Invalid instance
>> setup, missing znode path:
>> /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/STATUSUPDATES*
>> *2014-08-19 18:41:36,920 [main] (ZKUtil.java:95) INFO  Invalid instance
>> setup, missing znode path:
>> /terrapin/INSTANCES/hdfsterrapin-a-datanode-531b2679:9090/ERRORS*
>> *2014-08-19 18:41:36,920 [main] (ZKHelixManager.java:496) ERROR fail to
>> createClient.*
>> *org.apache.helix.HelixException: Initial cluster structure is not set up
>> for instance: hdfsterrapin-a-datanode-531b2679:9090, instanceType:
>> PARTICIPANT*
>> at
>> org.apache.helix.manager.zk.ParticipantManagerHelper.joinCluster(ParticipantManagerHelper.java:108)
>> at
>> org.apache.helix.manager.zk.ZKHelixManager.handleNewSessionAsParticipant(ZKHelixManager.java:869)
>> at
>> org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:838)
>> at
>> org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:493)
>> at
>> org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:519)
>> at
>> com.pinterest.terrapin.server.TerrapinServerHandler.start(TerrapinServerHandler.java:84)
>> at
>> com.pinterest.terrapin.server.TerrapinServerMain.main(TerrapinServerMain.java:31)
>> 2014-08-19 18:41:36,921 [main] (ZKHelixManager.java:522) ERROR fail to
>> connect hdfsterrapin-a-datanode-531b2679:9090
>> org.apache.helix.HelixException: Initial cluster structure is not set up
>> for instance: hdfsterrapin-a-datanode-531b2679:9090, instanceType:
>> PARTICIPANT
>> at
>> org.apache.helix.manager.zk.ParticipantManagerHelper.joinCluster(ParticipantManagerHelper.java:108)
>> at
>> org.apache.helix.manager.zk.ZKHelixManager.handleNewSessionAsParticipant(ZKHelixManager.java:869)
>> at
>> org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:838)
>> at
>> org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:493)
>> at
>> org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:519)
>> at
>> com.pinterest.terrapin.server.TerrapinServerHandler.start(TerrapinServerHandler.java:84)
>> at
>> com.pinterest.terrapin.server.TerrapinServerMain.main(TerrapinServerMain.java:31)
>>
>>
>>
>>
>>
>>
>>
>>
>

Mime
View raw message