curator-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jordan Zimmerman <jor...@jordanzimmerman.com>
Subject Re: question about curator - retry policy
Date Fri, 20 May 2016 22:23:51 GMT
Retry policy is only used for individual operations. Any client-server system needs to have
retries to avoid temporary network events. The entire curator-client and curator-framework
modules are written to handle ZooKeeper client connection maintenance. So, there isn’t one
thing I can point to. 

Internally, the ServiceDiscovery code uses a PathChildrenCache instance. If all you are using
is Service Discovery there is almost no need for you to monitor the connection state. What
are you trying to accomplish?

-Jordan

> On May 20, 2016, at 5:19 PM, Moshiko Kasirer <moshek@liveperson.com> wrote:
> 
> The thing is we have many negative tests in which we stop and start the zk quorum the
issue I raised only happens from time to time.... So it's hat I hard to reproduce. But you
just wrote that when the quorom is up the connection should be reconnected ... how? who does
that? ZkClient  or curator? That is not related to retry policy?
> 
> בתאריך 21 במאי 2016 01:12,‏ "Jordan Zimmerman" <jordan@jordanzimmerman.com
<mailto:jordan@jordanzimmerman.com>> כתב:
> If the ZK cluster’s quorum is restored, then the connection state should change to
RECONNECTED. There are copious tests in Curator itself that show this. If you’re seeing
that Curator does not restore a broken connection then there is a deeper bug. Can you create
a test that shows the problem?
> 
> -Jordan
> 
>> On May 20, 2016, at 5:07 PM, Moshiko Kasirer <moshek@liveperson.com <mailto:moshek@liveperson.com>>
wrote:
>> 
>> I mean that while zk cluster is up the curator connection state stays LOST
>> Which in our case means the app node in which it happens doesnt register himself
as avalable.... I just don't seem to understand when does curator gives up on trying to connect
zk and when he doesn't give up. 
>> Thanks for the help !
>> 
>> בתאריך 21 במאי 2016 00:58,‏ "Jordan Zimmerman" <jordan@jordanzimmerman.com
<mailto:jordan@jordanzimmerman.com>> כתב:
>> You must have a retry policy so that you don’t overwhelm your network and ZooKeeper
cluster. The example code shows how to create a reasonable one.
>>> sometimes although zk cluster is up the curator service discovery connection
isn't
>>> 
>> Service Discovery’s internal instances might be waiting based on the retry policy.
But, what do you mean by "curator service discovery connection isn’t”? There isn’t such
a thing as a service discovery connection. 
>> 
>> -Jordan
>> 
>>> On May 20, 2016, at 4:53 PM, Moshiko Kasirer <moshek@liveperson.com <mailto:moshek@liveperson.com>>
wrote:
>>> 
>>> We are using your service discovery. So you are saying I should not care about
the retry policy...? So the only thing left to explain is how come sometimes although zk cluster
is up the curator service discovery connection isn't.....
>>> 
>>> בתאריך 21 במאי 2016 00:43,‏ "Jordan Zimmerman" <jordan@jordanzimmerman.com
<mailto:jordan@jordanzimmerman.com>> כתב:
>>> If you are using Curator’s Service Discovery code, it will be continuously
re-trying the connections. This is not because of the retry policy it’s because the Service
Discovery code manages connection interruptions internally.
>>> 
>>> -Jordan
>>> 
>>>> On May 20, 2016, at 4:40 PM, Moshiko Kasirer <moshek@liveperson.com <mailto:moshek@liveperson.com>>
wrote:
>>>> 
>>>> Thanks for the replay I will send those logs ASAP.
>>>> It's difficult to understand the connection mechanism of zk ....
>>>> We are using curator 2.10 as our service discovery so we have to make sure
that when zk is alive we connect and inform the our server is up we do that by listening to
curator connection listener which I think has also to do with the retry policy.... But what
I can't understand is why sometimes we can see in the log that curator gave up (Lost) yet
still a second later curator connection is restored how? Is it because zk session heartbeat
restored the connection? Does that Iovine curator to change his connection state? And on the
other side we sometimes get to a point were zk is up but curator connection stays as Lost...
>>>> That is why I thought of using the new always try policy you entered do you
think it can help? That why  hope there will be no way that zk is up but curator status is
lost.....as once he will retry he will reconnect to zk.... Is that correct?
>>>> 
>>>> בתאריך 21 במאי 2016 00:10,‏ "Jordan Zimmerman" <jordan@jordanzimmerman.com
<mailto:jordan@jordanzimmerman.com>> כתב:
>>>> Curator’s retry policies are used within each CuratorFramework operation.
For example, when you call client.setData().forPath(p, b) the retry policy will be invoked
if there is a retry-able exception during the operation. In addition to the retryPolicy, there
are connection timeouts. The behavior of how this is handled changed between Curator 2.x and
Curator 3.x. In Curator 2.x, for every iteration of the retry, the operation will wait until
connection timeout when there’s no connection. In Curator 3.x, the connection timeout wait
only occurs once (if the default ConnectionHandlingPolicy is used).
>>>> 
>>>> In any event, ZooKeeper itself tries to maintain the connection. Also, Curator
will re-create the internally managed connection depending various network interruptions,
etc. I’d need to see the logs to give you more input. 
>>>> 
>>>> -Jordan
>>>> 
>>>>> On May 19, 2016, at 10:12 AM, Moshiko Kasirer <moshek@liveperson.com
<mailto:moshek@liveperson.com>> wrote:
>>>>> 
>>>>> first i would like to thank you about curator we are using it as part
of our service discovery 
>>>>> 
>>>>> solution and it helps a lot!! 
>>>>> 
>>>>> i have a question i hope you will be able to help me with. 
>>>>> 
>>>>> its regarding the curator retry policy it seems to me we dont really
understand when this policy is 
>>>>> 
>>>>> invoked,  as i see in our logs that although i configured it as max retry
1 actually in the logs i see 
>>>>> 
>>>>> many ZK re connection attempts (and many curator gave up messages but
later i see 
>>>>> 
>>>>> reconnected status...) . is it possible that that policy is only relevant
to manually invoked 
>>>>> 
>>>>> operations against the ZK cluster done via curator ? and that the re
connections i see in the logs 
>>>>> 
>>>>> are caused by the fact that the ZK was available during start up so sessions
were created and 
>>>>> 
>>>>> then when ZK was down the ZK clients (not curator)  are sending heartbeats
as part of the ZK 
>>>>> 
>>>>> architecture? that is the part i am failing to understand and i hope
you can help me with that.
>>>>> 
>>>>> you have recently added RetreyAllways policy and i wanted to know if
it is save to use it? 
>>>>> 
>>>>> the thing is we always want to retry to reconnect to ZK when he is available
but that is something 
>>>>> 
>>>>> the ZK client does as long as he has open sessions right?  i am not sure
that it has to do with the 
>>>>> 
>>>>> retry policy ... 
>>>>> 
>>>>> thanks,
>>>>> 
>>>>> moshiko
>>>>> 
>>>>> -- 
>>>>> 
>>>>> Moshiko Kasirer
>>>>> Software Engineer
>>>>> T: +972-74-700-4357 <tel:%2B972-74-700-4357>
>>>>>  <http://www.linkedin.com/company/164748>	 <http://twitter.com/liveperson>
 <http://www.facebook.com/LivePersonInc>	We Create Meaningful Connections
>>>>>  <http://roia.biz/im/n/ndiXvq1BAAGhL0MAABW7QgABwExmMQA-A/>
>>>>> 
>>>>> This message may contain confidential and/or privileged information.

>>>>> If you are not the addressee or authorized to receive this on behalf
of the addressee you must not use, copy, disclose or take action based on this message or
any information herein. 
>>>>> If you have received this message in error, please advise the sender
immediately by reply email and delete this message. Thank you.
>>>> 
>>>> 
>>>> This message may contain confidential and/or privileged information. 
>>>> If you are not the addressee or authorized to receive this on behalf of the
addressee you must not use, copy, disclose or take action based on this message or any information
herein. 
>>>> If you have received this message in error, please advise the sender immediately
by reply email and delete this message. Thank you.
>>> 
>>> 
>>> This message may contain confidential and/or privileged information. 
>>> If you are not the addressee or authorized to receive this on behalf of the addressee
you must not use, copy, disclose or take action based on this message or any information herein.

>>> If you have received this message in error, please advise the sender immediately
by reply email and delete this message. Thank you.
>> 
>> 
>> This message may contain confidential and/or privileged information. 
>> If you are not the addressee or authorized to receive this on behalf of the addressee
you must not use, copy, disclose or take action based on this message or any information herein.

>> If you have received this message in error, please advise the sender immediately
by reply email and delete this message. Thank you.
> 
> 
> This message may contain confidential and/or privileged information. 
> If you are not the addressee or authorized to receive this on behalf of the addressee
you must not use, copy, disclose or take action based on this message or any information herein.

> If you have received this message in error, please advise the sender immediately by reply
email and delete this message. Thank you.


Mime
View raw message