Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 45F822009F3 for ; Sat, 21 May 2016 00:23:36 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 448D0160A2A; Fri, 20 May 2016 22:23:36 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id EB2BC160A25 for ; Sat, 21 May 2016 00:23:34 +0200 (CEST) Received: (qmail 2079 invoked by uid 500); 20 May 2016 22:23:34 -0000 Mailing-List: contact user-help@curator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@curator.apache.org Delivered-To: mailing list user@curator.apache.org Received: (qmail 2068 invoked by uid 99); 20 May 2016 22:23:34 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 May 2016 22:23:34 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id B3C031800ED for ; Fri, 20 May 2016 22:23:33 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.948 X-Spam-Level: * X-Spam-Status: No, score=1.948 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, KAM_INFOUSMEBIZ=0.75, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=liveperson.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id Xy5y2X0iNW1P for ; Fri, 20 May 2016 22:23:31 +0000 (UTC) Received: from mail-oi0-f47.google.com (mail-oi0-f47.google.com [209.85.218.47]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id B94635F250 for ; Fri, 20 May 2016 22:23:30 +0000 (UTC) Received: by mail-oi0-f47.google.com with SMTP id b65so56276747oia.1 for ; Fri, 20 May 2016 15:23:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=liveperson.com; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to; bh=FUEv3iiz5gJ5mTMEfpUYunEgRL0EhuJw+lyvA7bsDCA=; b=igYjf6OeS9m0RbCbrEth0TCuqw6ePSOBCqaO1fircSlLFKEOw3fBJB27siBs5DKQLy 1ka05NZFwskqT/ppywWq9V9+jkaoNbVyW8C3aVr/ObWfICtZcw53BPoQy4b4fEJUrjAA rybDyXMgqoe9zV+s85ihGldHlS6WfF9kpxFW4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=FUEv3iiz5gJ5mTMEfpUYunEgRL0EhuJw+lyvA7bsDCA=; b=ZRfkzEyGCZFseAyUueueC6tDXo471u6HUMYmTALkUkYj4IZAgUM9OTKj1rt0ZS+7CA Gx7VXzWFu5l/OlNxTMFYhhavv/xD/zt51IB3a1dvVxqNkU/y7fD8+3sicjMg8mRiDTkN NfayfWPJxG6IgE6nc6YQmUnO22S1RdZUoiZ9IYbQxOC02zZQEFn77snw85u8K9si1sou ylsbpPOVcC44ZDnVwF/6JkiMIm44VOj3Yvzvmp6Oomoc/VuZxppjDvKk1wnJqo7HCMeX 2IR0TocHuIgnNA57Bs5gksKTzROvGOBSks1CpRavckgJSBmj2P5ZQc4UEUjbbMsbhU93 jFTQ== X-Gm-Message-State: AOPr4FUO02s9pa0GoA/7nokJPYV58CS7Yod2mHfd0LNiR2315ZSiGvqXi1qbmJz9Al/ROOQiU1sPEiWQyfbmeSbX6FxLrIDjJEiMHkvducWgftGFwOjUVMXI+q1mwH33OWxXbUJquhsaWpIoHyk0 MIME-Version: 1.0 X-Received: by 10.157.31.36 with SMTP id x33mr3691980otd.26.1463783009777; Fri, 20 May 2016 15:23:29 -0700 (PDT) Received: by 10.202.170.70 with HTTP; Fri, 20 May 2016 15:23:29 -0700 (PDT) Received: by 10.202.170.70 with HTTP; Fri, 20 May 2016 15:23:29 -0700 (PDT) In-Reply-To: References: <52F7A726-784A-4F2C-BF1D-8EA7389EE520@jordanzimmerman.com> <8E59C573-37EE-4DC5-91D1-9A6D12878FE3@jordanzimmerman.com> <3D74D729-CCE5-416D-AFDE-35F588D00586@jordanzimmerman.com> Date: Sat, 21 May 2016 01:23:29 +0300 Message-ID: Subject: Re: question about curator - retry policy From: Moshiko Kasirer To: user@curator.apache.org Content-Type: multipart/alternative; boundary=001a113e589653a70605334d8a62 archived-at: Fri, 20 May 2016 22:23:36 -0000 --001a113e589653a70605334d8a62 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Let's say I run the curator service discovery Client and then stop the network between client and zk servers for 30 minutes. then i restore the network do you expect curator to reconnect regardless to the fact the retry policy have given up? =D7=91=D7=AA=D7=90=D7=A8=D7=99=D7=9A 21 =D7=91=D7=9E=D7=90=D7=99 2016 01:19= ,=E2=80=8F "Moshiko Kasirer" =D7=9B=D7=AA=D7=91: > The thing is we have many negative tests in which we stop and start the z= k > quorum the issue I raised only happens from time to time.... So it's hat = I > hard to reproduce. But you just wrote that when the quorom is up the > connection should be reconnected ... how? who does that? ZkClient or > curator? That is not related to retry policy? > =D7=91=D7=AA=D7=90=D7=A8=D7=99=D7=9A 21 =D7=91=D7=9E=D7=90=D7=99 2016 01:= 12,=E2=80=8F "Jordan Zimmerman" > =D7=9B=D7=AA=D7=91: > >> If the ZK cluster=E2=80=99s quorum is restored, then the connection stat= e should >> change to RECONNECTED. There are copious tests in Curator itself that sh= ow >> this. If you=E2=80=99re seeing that Curator does not restore a broken co= nnection >> then there is a deeper bug. Can you create a test that shows the problem= ? >> >> -Jordan >> >> On May 20, 2016, at 5:07 PM, Moshiko Kasirer >> wrote: >> >> I mean that while zk cluster is up the curator connection state stays LO= ST >> Which in our case means the app node in which it happens doesnt register >> himself as avalable.... I just don't seem to understand when does curato= r >> gives up on trying to connect zk and when he doesn't give up. >> Thanks for the help ! >> =D7=91=D7=AA=D7=90=D7=A8=D7=99=D7=9A 21 =D7=91=D7=9E=D7=90=D7=99 2016 00= :58,=E2=80=8F "Jordan Zimmerman" < >> jordan@jordanzimmerman.com> =D7=9B=D7=AA=D7=91: >> >>> You must have a retry policy so that you don=E2=80=99t overwhelm your n= etwork >>> and ZooKeeper cluster. The example code shows how to create a reasonabl= e >>> one. >>> >>> sometimes although zk cluster is up the curator service discovery >>> connection isn't >>> >>> Service Discovery=E2=80=99s internal instances might be waiting based o= n the >>> retry policy. But, what do you mean by "curator service discovery >>> connection isn=E2=80=99t=E2=80=9D? There isn=E2=80=99t such a thing as = a service discovery >>> connection. >>> >>> -Jordan >>> >>> On May 20, 2016, at 4:53 PM, Moshiko Kasirer >>> wrote: >>> >>> We are using your service discovery. So you are saying I should not car= e >>> about the retry policy...? So the only thing left to explain is how com= e >>> sometimes although zk cluster is up the curator service discovery >>> connection isn't..... >>> =D7=91=D7=AA=D7=90=D7=A8=D7=99=D7=9A 21 =D7=91=D7=9E=D7=90=D7=99 2016 0= 0:43,=E2=80=8F "Jordan Zimmerman" < >>> jordan@jordanzimmerman.com> =D7=9B=D7=AA=D7=91: >>> >>> If you are using Curator=E2=80=99s Service Discovery code, it will be >>> continuously re-trying the connections. This is not because of the retr= y >>> policy it=E2=80=99s because the Service Discovery code manages connecti= on >>> interruptions internally. >>> >>> -Jordan >>> >>> On May 20, 2016, at 4:40 PM, Moshiko Kasirer >>> wrote: >>> >>> Thanks for the replay I will send those logs ASAP. >>> It's difficult to understand the connection mechanism of zk .... >>> We are using curator 2.10 as our service discovery so we have to make >>> sure that when zk is alive we connect and inform the our server is up w= e do >>> that by listening to curator connection listener which I think has also= to >>> do with the retry policy.... But what I can't understand is why sometim= es >>> we can see in the log that curator gave up (Lost) yet still a second la= ter >>> curator connection is restored how? Is it because zk session heartbeat >>> restored the connection? Does that Iovine curator to change his connect= ion >>> state? And on the other side we sometimes get to a point were zk is up = but >>> curator connection stays as Lost... >>> That is why I thought of using the new always try policy you entered do >>> you think it can help? That why hope there will be no way that zk is u= p >>> but curator status is lost.....as once he will retry he will reconnect = to >>> zk.... Is that correct? >>> =D7=91=D7=AA=D7=90=D7=A8=D7=99=D7=9A 21 =D7=91=D7=9E=D7=90=D7=99 2016 0= 0:10,=E2=80=8F "Jordan Zimmerman" < >>> jordan@jordanzimmerman.com> =D7=9B=D7=AA=D7=91: >>> >>>> Curator=E2=80=99s retry policies are used within each CuratorFramework >>>> operation. For example, when you call client.setData().forPath(p, b) t= he >>>> retry policy will be invoked if there is a retry-able exception during= the >>>> operation. In addition to the retryPolicy, there are connection timeou= ts. >>>> The behavior of how this is handled changed between Curator 2.x and Cu= rator >>>> 3.x. In Curator 2.x, for every iteration of the retry, the operation w= ill >>>> wait until connection timeout when there=E2=80=99s no connection. In C= urator 3.x, >>>> the connection timeout wait only occurs once (if the default >>>> ConnectionHandlingPolicy is used). >>>> >>>> In any event, ZooKeeper itself tries to maintain the connection. Also, >>>> Curator will re-create the internally managed connection depending var= ious >>>> network interruptions, etc. I=E2=80=99d need to see the logs to give y= ou more >>>> input. >>>> >>>> -Jordan >>>> >>>> On May 19, 2016, at 10:12 AM, Moshiko Kasirer >>>> wrote: >>>> >>>> first i would like to thank you about curator we are using it as part >>>> of our service discovery >>>> >>>> solution and it helps a lot!! >>>> >>>> i have a question i hope you will be able to help me with. >>>> >>>> its regarding the curator retry policy it seems to me we dont really >>>> understand when this policy is >>>> >>>> invoked, as i see in our logs that although i configured it as max >>>> retry 1 actually in the logs i see >>>> >>>> many ZK re connection attempts (and many curator gave up messages but >>>> later i see >>>> >>>> reconnected status...) . is it possible that that policy is only >>>> relevant to manually invoked >>>> >>>> operations against the ZK cluster done via curator ? and that the re >>>> connections i see in the logs >>>> >>>> are caused by the fact that the ZK was available during start up so >>>> sessions were created and >>>> >>>> then when ZK was down the ZK clients *(not curator) * are sending >>>> heartbeats as part of the ZK >>>> >>>> architecture? that is the part i am failing to understand and i hope >>>> you can help me with that. >>>> >>>> you have recently added RetreyAllways policy and i wanted to know if i= t >>>> is save to use it? >>>> >>>> the thing is we always want to retry to reconnect to ZK when he is >>>> available but that is something >>>> >>>> the ZK client does as long as he has open sessions right? i am not >>>> sure that it has to do with the >>>> >>>> retry policy ... >>>> >>>> thanks, >>>> >>>> moshiko >>>> >>>> -- >>>> Moshiko Kasirer >>>> Software Engineer >>>> T: +972-74-700-4357 >>>> >>>> We >>>> Create Meaningful Connections >>>> >>>> >>>> This message may contain confidential and/or privileged information. >>>> If you are not the addressee or authorized to receive this on behalf o= f >>>> the addressee you must not use, copy, disclose or take action based on= this >>>> message or any information herein. >>>> If you have received this message in error, please advise the sender >>>> immediately by reply email and delete this message. Thank you. >>>> >>>> >>>> >>> This message may contain confidential and/or privileged information. >>> If you are not the addressee or authorized to receive this on behalf of >>> the addressee you must not use, copy, disclose or take action based on = this >>> message or any information herein. >>> If you have received this message in error, please advise the sender >>> immediately by reply email and delete this message. Thank you. >>> >>> >>> >>> This message may contain confidential and/or privileged information. >>> If you are not the addressee or authorized to receive this on behalf of >>> the addressee you must not use, copy, disclose or take action based on = this >>> message or any information herein. >>> If you have received this message in error, please advise the sender >>> immediately by reply email and delete this message. Thank you. >>> >>> >>> >> This message may contain confidential and/or privileged information. >> If you are not the addressee or authorized to receive this on behalf of >> the addressee you must not use, copy, disclose or take action based on t= his >> message or any information herein. >> If you have received this message in error, please advise the sender >> immediately by reply email and delete this message. Thank you. >> >> >> --=20 This message may contain confidential and/or privileged information.=20 If you are not the addressee or authorized to receive this on behalf of the= =20 addressee you must not use, copy, disclose or take action based on this=20 message or any information herein.=20 If you have received this message in error, please advise the sender=20 immediately by reply email and delete this message. Thank you. --001a113e589653a70605334d8a62 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

Let's say I run the curator service discovery Client and= then stop the network between client and zk servers for 30 minutes. then i= restore the network do you expect curator to reconnect regardless to the f= act the retry policy have given up?

=D7=91=D7=AA=D7=90=D7=A8=D7=99=D7=9A 21 =D7=91= =D7=9E=D7=90=D7=99 2016 01:19,=E2=80=8F "Moshiko Kasirer" <moshek@liveperson.com> =D7=9B= =D7=AA=D7=91:

The thing is we have many negative tests in which we stop and star= t the zk quorum the issue I raised only happens from time to time.... So it= 's hat I hard to reproduce. But you just wrote that when the quorom is = up the connection should be reconnected ... how? who does that? ZkClient=C2= =A0 or curator? That is not related to retry policy?

=D7=91=D7=AA=D7=90=D7=A8=D7=99=D7=9A 21 =D7=91= =D7=9E=D7=90=D7=99 2016 01:12,=E2=80=8F "Jordan Zimmerman" <jordan@jordan= zimmerman.com> =D7=9B=D7=AA=D7=91:
If the ZK cl= uster=E2=80=99s quorum is restored, then the connection state should change= to RECONNECTED. There are copious tests in Curator itself that show this. = If you=E2=80=99re seeing that Curator does not restore a broken connection = then there is a deeper bug. Can you create a test that shows the problem?

-Jordan

On May 20, 2016, at 5:07 PM, Moshiko Kasirer <moshek@liveperson.com> wrote:

I mean that while zk clus= ter is up the curator connection state stays LOST
Which in our case means the app node in which it happens doesnt register hi= mself as avalable.... I just don't seem to understand when does curator= gives up on trying to connect zk and when he doesn't give up.
Thanks for the help !

=D7=91=D7=AA=D7=90=D7=A8=D7=99=D7=9A 21 =D7=91= =D7=9E=D7=90=D7=99 2016 00:58,=E2=80=8F "Jordan Zimmerman" <jordan@jordan= zimmerman.com> =D7=9B=D7=AA=D7=91:
You must hav= e a retry policy so that you don=E2=80=99t overwhelm your network and ZooKe= eper cluster. The example code shows how to create a reasonable one.
<= div>

s= ometimes although zk cluster is up the curator service discovery connection= isn't

Service Discovery=E2=80=99s internal insta= nces might be waiting based on the retry policy. But, what do you mean by &= quot;curator service discovery connection isn=E2=80=99t=E2=80=9D? There isn= =E2=80=99t such a thing as a service discovery connection.=C2=A0
=
-Jordan

On May = 20, 2016, at 4:53 PM, Moshiko Kasirer <moshek@liveperson.com> wrote:

We are using your service discover= y. So you are saying I should not care about the retry policy...? So the on= ly thing left to explain is how come sometimes although zk cluster is up th= e curator service discovery connection isn't.....

=D7=91=D7=AA=D7=90=D7=A8=D7=99=D7=9A 21 =D7=91= =D7=9E=D7=90=D7=99 2016 00:43,=E2=80=8F "Jordan Zimmerman" <jordan@jordan= zimmerman.com> =D7=9B=D7=AA=D7=91:
=
If you are using Curator=E2=80=99s= Service Discovery code, it will be continuously re-trying the connections.= This is not because of the retry policy it=E2=80=99s because the Service D= iscovery code manages connection interruptions internally.

-Jordan

On May 20, 2016, at 4:40 PM, Moshiko Kasirer <moshek@liveperson.c= om> wrote:

Than= ks for the replay I will send those logs ASAP.
It's difficult to understand the connection mechanism of zk ....
We are using curator 2.10 as our service discovery so we have to make sure = that when zk is alive we connect and inform the our server is up we do that= by listening to curator connection listener which I think has also to do w= ith the retry policy.... But what I can't understand is why sometimes w= e can see in the log that curator gave up (Lost) yet still a second later c= urator connection is restored how? Is it because zk session heartbeat resto= red the connection? Does that Iovine curator to change his connection state= ? And on the other side we sometimes get to a point were zk is up but curat= or connection stays as Lost...
That is why I thought of using the new always try policy you entered do you= think it can help? That why=C2=A0 hope there will be no way that zk is up = but curator status is lost.....as once he will retry he will reconnect to z= k.... Is that correct?

=D7=91=D7=AA=D7=90=D7=A8=D7=99=D7=9A 21 =D7=91= =D7=9E=D7=90=D7=99 2016 00:10,=E2=80=8F "Jordan Zimmerman" <jordan@jordan= zimmerman.com> =D7=9B=D7=AA=D7=91:
Curator=E2= =80=99s retry policies are used within each CuratorFramework operation. For= example, when you call client.setData().forPath(p, b) the retry policy wil= l be invoked if there is a retry-able exception during the operation. In ad= dition to the retryPolicy, there are connection timeouts. The behavior of h= ow this is handled changed between Curator 2.x and Curator 3.x. In Curator = 2.x, for every iteration of the retry, the operation will wait until connec= tion timeout when there=E2=80=99s no connection. In Curator 3.x, the connec= tion timeout wait only occurs once (if the default ConnectionHandlingPolicy= is used).

In any event, ZooKeeper itself tries to= maintain the connection. Also, Curator will re-create the internally manag= ed connection depending various network interruptions, etc. I=E2=80=99d nee= d to see the logs to give you more input.=C2=A0

-J= ordan

On May 19, 2016, at 10:1= 2 AM, Moshiko Kasirer <moshek@liveperson.com> wrote:

first i would like to thank you about = curator we are using it as part of our service discovery=C2=A0

solution= and it helps a lot!!=C2=A0

=
i have a question i hope you will be able t= o help me with.=C2=A0

its regarding the curator retry policy it seems t= o me we dont really understand when this policy is=C2=A0

invoked, =C2= =A0as i see in our logs that although i configured it as max retry 1 actual= ly in the logs i see=C2=A0

<= div style=3D"font-size:12.8px">many ZK re connection attempts (and many cur= ator gave up messages but later i see=C2=A0

reconnected status...) . is= it possible that that policy is only relevant to manually invoked=C2=A0

operations against the ZK cluster done via curator ? and that the re conn= ections i see in the logs=C2=A0

are caused by the fact that the ZK was = available during start up so sessions were created and=C2=A0

then when = ZK was down the ZK clients=C2=A0(not curator)=C2=A0=C2=A0are sending= heartbeats as part of the ZK=C2=A0
architecture? that is the part i am= failing to understand and i hope you can help me with that.

you have r= ecently added RetreyAllways policy and i wanted to know if it is save to us= e it?=C2=A0

the thing is we always want to retry to reconnect to ZK whe= n he is available but that is something=C2=A0

the ZK client does as lon= g as he has open sessions right? =C2=A0i am not sure that it has to do with= the=C2=A0

retry policy ...=C2=A0
=
thanks,

moshiko
--
=20
Moshiko Kasirer
Software Engineer
T: +972-74-700-4357
We Create Meaningful Connections
=20

This message may contain confidential and/or privileg= ed information.=C2=A0
If you are not the = addressee or authorized to receive this on behalf of the addressee you must= not use, copy, disclose or take action based on this message or any inform= ation herein.=C2=A0
If you have received = this message in error, please advise the sender immediately by reply email = and delete this message. Thank you.

This message may contain confidential and/or privileg= ed information.=C2=A0
If you are not the = addressee or authorized to receive this on behalf of the addressee you must= not use, copy, disclose or take action based on this message or any inform= ation herein.=C2=A0
If you have received = this message in error, please advise the sender immediately by reply email = and delete this message. Thank you.

This message may contain confidential and/or privileg= ed information.=C2=A0
If you are not the = addressee or authorized to receive this on behalf of the addressee you must= not use, copy, disclose or take action based on this message or any inform= ation herein.=C2=A0
If you have received = this message in error, please advise the sender immediately by reply email = and delete this message. Thank you.

This message may contain confidential and/or privileg= ed information.=C2=A0
If you are not the = addressee or authorized to receive this on behalf of the addressee you must= not use, copy, disclose or take action based on this message or any inform= ation herein.=C2=A0
If you have received = this message in error, please advise the sender immediately by reply email = and delete this message. Thank you.

This message may contain confidential and/or privileg= ed information.=C2=A0
If you are not the = addressee or authorized to receive this on behalf of the addressee you must= not use, copy, disclose or take action based on this message or any inform= ation herein.=C2=A0
If you have received = this message in error, please advise the sender immediately by reply email = and delete this message. Thank you.
--001a113e589653a70605334d8a62--