Return-Path: X-Original-To: apmail-helix-user-archive@minotaur.apache.org Delivered-To: apmail-helix-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5D1AB175E5 for ; Sun, 8 Mar 2015 04:14:46 +0000 (UTC) Received: (qmail 19512 invoked by uid 500); 8 Mar 2015 04:14:45 -0000 Delivered-To: apmail-helix-user-archive@helix.apache.org Received: (qmail 19453 invoked by uid 500); 8 Mar 2015 04:14:45 -0000 Mailing-List: contact user-help@helix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@helix.apache.org Delivered-To: mailing list user@helix.apache.org Received: (qmail 19440 invoked by uid 99); 8 Mar 2015 04:14:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 08 Mar 2015 04:14:45 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,LOTS_OF_MONEY,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of g.kishore@gmail.com designates 209.85.212.181 as permitted sender) Received: from [209.85.212.181] (HELO mail-wi0-f181.google.com) (209.85.212.181) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 08 Mar 2015 04:14:20 +0000 Received: by widfb4 with SMTP id fb4so1838835wid.0 for ; Sat, 07 Mar 2015 20:12:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=Z4QnHxLGEXhiHz0oqCnqKndqUZqnoAzlns6pchcojyg=; b=W3JjsTz9yiHaMG3cLJPB7/hSUGprRfp2oQn+5cAeIncwIyeite93NkmwQhlJNMxxxD DStebVhWWPpyq8JXRT/wRvM2/mEXZiozPr15DSJTig0LokfsQdymwtse8KM8DbF5KXSd rh8rpw0y2UOz4xwA3mHYfdOCH47vsIqWhuvqY8TVbzt2HsqfsPM+rJOpR1ngEQrgsKvy DbnrTJw8msX0S7hFPY2LTjINsQbeki4q3g3qfxxOfvkj45iTjAZsHg/JNiWlMCiNjH7p LqQiYp1s/CRQdE/kH9sKB+q1ie+8jPWM8UfhROj4KyZCi5hUpzoiKwDJ7hEKY9ru2PzC 09YQ== MIME-Version: 1.0 X-Received: by 10.180.108.13 with SMTP id hg13mr47230865wib.7.1425787968999; Sat, 07 Mar 2015 20:12:48 -0800 (PST) Received: by 10.194.81.131 with HTTP; Sat, 7 Mar 2015 20:12:48 -0800 (PST) In-Reply-To: References: <23CA11DC8830BA44A37C6B44B14D013CB9D1EBB0@ESV4-MB02.linkedin.biz> <23CA11DC8830BA44A37C6B44B14D013CB9D1EC07@ESV4-MB02.linkedin.biz> Date: Sat, 7 Mar 2015 20:12:48 -0800 Message-ID: Subject: Re: RoutingTableProvider dropping callbacks From: kishore g To: "user@helix.apache.org" Content-Type: multipart/alternative; boundary=e89a8f3b9be56ad7530510bf2120 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f3b9be56ad7530510bf2120 Content-Type: text/plain; charset=UTF-8 Another thing is that the RoutingTable is logging this line "Resetting the routing table.". Looks like this happens when we fail to set the watcher. thanks, Kishore G On Sat, Mar 7, 2015 at 8:05 PM, kishore g wrote: > Your explanation makes sense. > > > https://github.com/apache/helix/blob/helix-0.6.4/helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixDataAccessor.java. > For bucketized resource we see that path is deleted and set again. Jason, > any idea why we are removing the path? > > case EXTERNALVIEW: if (value.getBucketSize() == 0) { records.add(value. > getRecord()); } else { _baseDataAccessor.remove(path, options); > > On Sat, Mar 7, 2015 at 4:03 PM, Varun Sharma wrote: > >> How does the writing of externalview work for bucketized resources -is it >> possible that the top level znode for the resource is first deleted and >> then rewritten with the latest external view ? >> >> On Sat, Mar 7, 2015 at 3:56 PM, Varun Sharma wrote: >> >>> Here is the stack trace - there is a zookeeper race and the detailed >>> stack trace appears for bucketized resources. I saw that the ideal state >>> for the resource was created on 26th Feb and was modified on 7th March. >>> However, the external view for the resource is showing up as created on 7th >>> march as well as modified on 7th march. The external view is created at >>> 10:36:04 on 7th march which is 20 seconds after this log message stack >>> trace is spit out. After this the routing table provider no longer receives >>> any more zk callbacks. >>> >>> 2015-03-07 10:35:43,735 [main-EventThread] (ZkAsyncCallbacks.java:127) >>> WARN >>> org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCallbackHandler@3c8589f0, >>> rc:NONODE, path: >>> /main_a/EXTERNALVIEW/$terrapin$data$visual_seo_joins_staging$1422384697040 >>> >>> 2015-03-07 10:35:43,736 [main-EventThread] (ZkAsyncCallbacks.java:127) >>> WARN >>> org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCallbackHandler@63230a9a, >>> rc:NONODE, path: >>> /main_a/EXTERNALVIEW/$terrapin$data$recommendation_p2p_exp_candset_1$1425671237739 >>> >>> 2015-03-07 10:35:43,736 [main-EventThread] (ZkAsyncCallbacks.java:127) >>> WARN >>> org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCallbackHandler@118d374f, >>> rc:NONODE, path: /main_a/EXTERNALVIEW/$terrapin$data$None$1422308641250 >>> >>> 2015-03-07 10:35:43,736 [ZkClient-EventThread-17-terrapinzk001a:2181] >>> (CallbackHandler.java:304) WARN fail to subscribe child/data change. path: >>> /main_a/EXTERNALVIEW, listener: >>> com.pinterest.terrapin.controller.TerrapinRoutingTableProvider@2c6691da >>> >>> *org.I0Itec.zkclient.exception.ZkNoNodeException: >>> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = >>> NoNode for /main_a/EXTERNALVIEW/$terrapin$data$None$1422308641250* >>> >>> at >>> org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47) >>> >>> at >>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685) >>> >>> at >>> org.apache.helix.manager.zk.ZkClient.getChildren(ZkClient.java:210) >>> >>> at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:409) >>> >>> at >>> org.apache.helix.manager.zk.CallbackHandler.subscribeForChanges(CallbackHandler.java:279) >>> >>> at >>> org.apache.helix.manager.zk.CallbackHandler.invoke(CallbackHandler.java:202) >>> >>> at >>> org.apache.helix.manager.zk.CallbackHandler.handleChildChange(CallbackHandler.java:391) >>> >>> at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:570) >>> >>> at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) >>> >>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: >>> KeeperErrorCode = NoNode for >>> /main_a/EXTERNALVIEW/$terrapin$data$None$1422308641250 >>> >>> at >>> org.apache.zookeeper.KeeperException.create(KeeperException.java:102) >>> at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) >>> >>> at >>> org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1249) >>> >>> 2015-03-07 10:35:43,848 [ZkClient-EventThread-17-terrapinzk001a:2181] >>> (RoutingTableProvider.java:99) INFO *Resetting* the routing table. >>> >>> On Thu, Mar 5, 2015 at 11:33 AM, Varun Sharma >>> wrote: >>> >>>> I suspect the callbacks are not coming in, for a long time now. >>>> >>>> On Thu, Mar 5, 2015 at 11:30 AM, Varun Sharma >>>> wrote: >>>> >>>>> I grepped this and found nothing: >>>>> >>>>> sudo grep START:INVOKE.*EXTERNALVIEW /var/log/terrapin/controller.log* >>>>> >>>>> I found a bunch of START:INVOKE for the IDEALSTATES znode though. >>>>> >>>>> On Thu, Mar 5, 2015 at 11:15 AM, Zhen Zhang >>>>> wrote: >>>>> >>>>>> Yes. you should see a pair of "START:INVOKE..." and >>>>>> "END:INVOKE:..." for each callback in your log. >>>>>> ------------------------------ >>>>>> *From:* Varun Sharma [varun@pinterest.com] >>>>>> *Sent:* Thursday, March 05, 2015 11:11 AM >>>>>> *To:* user@helix.apache.org >>>>>> *Subject:* Re: RoutingTableProvider dropping callbacks >>>>>> >>>>>> Ohk - is there a way to confirm that the callbacks are being >>>>>> processed (from the logs etc.) ? >>>>>> >>>>>> On Thu, Mar 5, 2015 at 10:50 AM, Zhen Zhang >>>>>> wrote: >>>>>> >>>>>>> Hi Varun, >>>>>>> >>>>>>> This should not be a problem. When we register a callback, we are >>>>>>> expecting a call back type of INIT first, followed by a sequence of >>>>>>> CALLBACK types, and when you unregister the callback, you will received a >>>>>>> FINALIZED type. Since unregister is an async operation, when you receive a >>>>>>> FINALIZED type, you might still see a couple of CALLBACK type callbacks, >>>>>>> which are simply ignored. The log is basically telling you that. >>>>>>> >>>>>>> Thanks, >>>>>>> Jason >>>>>>> ------------------------------ >>>>>>> *From:* Varun Sharma [varun@pinterest.com] >>>>>>> *Sent:* Thursday, March 05, 2015 10:44 AM >>>>>>> *To:* user@helix.apache.org >>>>>>> *Subject:* RoutingTableProvider dropping callbacks >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> It seems that the RoutingTableProvider is dropping callbacks in >>>>>>> our case. Here is a log: >>>>>>> >>>>>>> [ZkClient-EventThread-17-terrapinzk001a:2181] >>>>>>> (CallbackHandler.java:130) WARN Skip processing callbacks for listener: >>>>>>> com.pinterest.terrapin.controller.TerrapinRoutingTableProvider@7e7f8062, >>>>>>> path: /main_a/EXTERNALVIEW, expected types: [INIT] but was CALLBACK >>>>>>> >>>>>>> >>>>>>> We have a custom RoutingTableProvider to catch callbacks and do >>>>>>> some processing - this is causing a lot of issues for us. What could be >>>>>>> causing this ? >>>>>>> >>>>>>> Thanks >>>>>>> Varun >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> > --e89a8f3b9be56ad7530510bf2120 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Another thing is that the RoutingTable is logging this lin= e "Resetting the routing = table.". Looks like this happens when we fail to set the watcher.

<= span style=3D"font-size:12.8000001907349px">thanks,
Kishore G

On Sat, Mar 7, 2015 at 8:= 05 PM, kishore g <g.kishore@gmail.com> wrote:
Your explanation makes sense.=C2=A0<= div>
https://github.com/apache/helix/blob/helix-0.6.4/= helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixDataAccessor.ja= va. For bucketized resource we see that path is deleted and set again. = Jason, any idea why we are removing the path?=C2=A0

<= /td>
case EXTERNALVIEW:
if (value= .getBucketSize() =3D=3D = 0) {
records.add(value.= getRecord());
= } else {
_baseDataAccessor.remove(path, options);

On Sat, Mar 7,= 2015 at 4:03 PM, Varun Sharma <varun@pinterest.com> wrote= :
How does the writing o= f externalview work for bucketized resources -is it possible that the top l= evel znode for the resource is first deleted and then rewritten with the la= test external view ?

On Sat, Mar 7, 2015 at 3:56 PM, Varun Sharma <varun= @pinterest.com> wrote:
Here is the stack trace - there is a zookeeper race and the de= tailed stack trace appears for bucketized resources. I saw that the ideal s= tate for the resource was created on 26th Feb and was modified on 7th March= . However, the external view for the resource is showing up as created on 7= th march as well as modified on 7th march. The external view is created at = 10:36:04 on 7th march which is 20 seconds after this log message stack trac= e is spit out. After this the routing table provider no longer receives any= more zk callbacks.

2015-03-07 10:35:43,735 [main-EventThread] (ZkAsyncCallbacks.java:127) W= ARN=C2=A0 org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCallbackHandl= er@3c8589f0, rc:NONODE, path: /main_a/EXTERNALVIEW/$terrapin$data$visual_se= o_joins_staging$1422384697040

2015-03-07 10:35:43,736 [main-EventThread] (ZkAsyncCallbacks.java:127) W= ARN=C2=A0 org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCallbackHandl= er@63230a9a, rc:NONODE, path: /main_a/EXTERNALVIEW/$terrapin$data$recommend= ation_p2p_exp_candset_1$1425671237739

2015-03-07 10:35:43,736 [main-EventThread] (ZkAsyncCallbacks.java:127) W= ARN=C2=A0 org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCallbackHandl= er@118d374f, rc:NONODE, path: /main_a/EXTERNALVIEW/$terrapin$data$None$1422= 308641250

2015-03-07 10:35:43,736 [ZkClient-EventThread-17-terrapinzk001a:2181] (C= allbackHandler.java:304) WARN=C2=A0 fail to subscribe child/data change. pa= th: /main_a/EXTERNALVIEW, listener: com.pinterest.terrapin.controller.Terra= pinRoutingTableProvider@2c6691da

org.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper= .KeeperException$NoNodeException: KeeperErrorCode =3D NoNode for /main_a/EX= TERNALVIEW/$terrapin$data$None$1422308641250

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.I0Itec.zkclient.exception.ZkException= .create(ZkException.java:47)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.I0Itec.zkclient.ZkClient.retryUntilCo= nnected(ZkClient.java:685)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.helix.manager.zk.ZkClient.getC= hildren(ZkClient.java:210)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.I0Itec.zkclient.ZkClient.getChildren(= ZkClient.java:409)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.helix.manager.zk.CallbackHandl= er.subscribeForChanges(CallbackHandler.java:279)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.helix.manager.zk.CallbackHandl= er.invoke(CallbackHandler.java:202)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.helix.manager.zk.CallbackHandl= er.handleChildChange(CallbackHandler.java:391)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.I0Itec.zkclient.ZkClient$7.run(ZkClie= nt.java:570)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.I0Itec.zkclient.ZkEventThread.run(ZkE= ventThread.java:71)

Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperE= rrorCode =3D NoNode for /main_a/EXTERNALVIEW/$terrapin$data$None$1422308641= 250

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.zookeeper.KeeperException.crea= te(KeeperException.java:102)=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.zooke= eper.KeeperException.create(KeeperException.java:42)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.zookeeper.ZooKeeper.getChildre= n(ZooKeeper.java:1249)

2015-03-07 10:35:43,848 [ZkClient-EventThread-17-terrapinzk001a:2181= ] (RoutingTableProvider.java:99) INFO=C2=A0 Resetting the routing ta= ble.


On Thu, Mar 5, 2015 at 11:33 AM, Varun Sharma <varun@pinter= est.com> wrote:
I suspect the callbacks are not coming in, for a long time now.
=

On Thu, Mar= 5, 2015 at 11:30 AM, Varun Sharma <varun@pinterest.com> w= rote:
I= grepped this and found nothing:

sudo grep START:INVOKE.*EXTERNALVIEW /var/log/terrapin/controller.log*

I found a bunch of START:INVOKE for the IDEALSTATES znode though.

<= /div>

On Thu, Mar 5, 2015 at 11:15 AM, Zhen Zhang <zzhang@linkedin.com= > wrote:
Yes. you should see a pair of "START:= INVOKE..." and "END:INVOKE:..." for each callback in your lo= g.

Fro= m: Varun Sharma [varun@pinterest.com]
Sent: Thursday, March 05, 2015 11:11 AM
To: user@= helix.apache.org
Subject: Re: RoutingTableProvider dropping callbacks

Ohk - is there a way to confirm that the callbacks are bei= ng processed (from the logs etc.) ?

On Thu, Mar 5, 2015 at 10:50 AM, Zhen Zhang <zzhang@linkedi= n.com> wrote:
Hi Varun,=C2=A0

This should not be a problem. When we register a callback, we are expe= cting a call back type of INIT first, followed by a sequence of CALLBACK ty= pes, and when you unregister the callback, you will received a FINALIZED ty= pe. Since unregister is an async operation, when you receive a FINALIZED type, you might still see a couple= of CALLBACK type callbacks, which are simply ignored. The log is basically= telling you that.

Thanks,
Jason

Fro= m: Varun Sharma [varun@pinterest.com]
Sent: Thursday, March 05, 2015 10:44 AM
To: user@= helix.apache.org
Subject: RoutingTableProvider dropping callbacks

Hi,

It seems that the RoutingTableProvider is dropping callbacks in our ca= se. Here is a log:

[ZkClient-EventThread-17-terrapinzk001a:2181] (CallbackHandler.java:130)= WARN=C2=A0 Skip processing callbacks for listener: com.pinterest.terrapin.= controller.TerrapinRoutingTableProvider@7e7f8062, path: /main_a/EXTERNALVIE= W, expected types: [INIT] but was CALLBACK


We have a custom RoutingTableProvider to catch callbacks and do some pro= cessing - this is causing a lot of issues for us. What =C2=A0could be causi= ng this ?


Thanks
Varun







--e89a8f3b9be56ad7530510bf2120--