Return-Path: X-Original-To: apmail-helix-user-archive@minotaur.apache.org Delivered-To: apmail-helix-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8CBBF1732B for ; Sat, 7 Mar 2015 23:57:18 +0000 (UTC) Received: (qmail 8418 invoked by uid 500); 7 Mar 2015 23:57:18 -0000 Delivered-To: apmail-helix-user-archive@helix.apache.org Received: (qmail 8365 invoked by uid 500); 7 Mar 2015 23:57:18 -0000 Mailing-List: contact user-help@helix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@helix.apache.org Delivered-To: mailing list user@helix.apache.org Received: (qmail 8344 invoked by uid 99); 7 Mar 2015 23:57:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 Mar 2015 23:57:18 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,LOTS_OF_MONEY,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of varun@pinterest.com designates 209.85.223.179 as permitted sender) Received: from [209.85.223.179] (HELO mail-ie0-f179.google.com) (209.85.223.179) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 Mar 2015 23:57:14 +0000 Received: by iecrd18 with SMTP id rd18so3755509iec.12 for ; Sat, 07 Mar 2015 15:56:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pinterest.com; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=pa1RbYoSVnCDElH3k7tskNZev+6fif616mmG5YV5nJQ=; b=NDT5eGyfIoIigCaXsE1GnxSR6jzjAsDZRO4blZKRqt0gqbxMtCR6fSHWrk9ppF27mH lzgBAp3GtGovCoGY66y3bRa2aPoMDNwY3R215OOVFfW8NxlPeoIi33o9vEgzu4ELyXSO 6gfIqa0iDz7uTsOKEGp9+5l/Uh4Wz0IiF04Hw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=pa1RbYoSVnCDElH3k7tskNZev+6fif616mmG5YV5nJQ=; b=ZpiKtgPfRt3HHIcMSKjByUfVSUBXZoQvGWr/mPCWM/JXY0Dx2aJCKA4zmsPBV1YRiI Leb0VW5dGrAKjxPXeumADOOy05A0j9aeEfljWrPCeRufDKjHtdr44tLNwggk/oRhdEQN uY3L8q7eRP0oz6NA6Z8ZVItRG+laOzm4c2ciObmKvCA8crrZvWkIab62BVJrZj8Aw98c 9EmGbGVHhYMfJJf+9QNofDH0vfrABEWis1M94DHsPhx66q7UEwqOQqlRDR0vBJQ3IYp6 o7S+dHJu0jtB1EtyOho+Ok5ZYiSdNu+QRw2YuDmcWXPZs0Vnj14J8AEkxSwlAwCo9Tzs Cg+w== X-Gm-Message-State: ALoCoQlMmNJU9zPZLXcB21oeieq3NCKyXKCCaIkYl/CqNuDfrA/205ypGhxCKvSN4NRZj6RLOxXW MIME-Version: 1.0 X-Received: by 10.50.1.48 with SMTP id 16mr37901916igj.45.1425772613483; Sat, 07 Mar 2015 15:56:53 -0800 (PST) Received: by 10.107.179.196 with HTTP; Sat, 7 Mar 2015 15:56:53 -0800 (PST) In-Reply-To: References: <23CA11DC8830BA44A37C6B44B14D013CB9D1EBB0@ESV4-MB02.linkedin.biz> <23CA11DC8830BA44A37C6B44B14D013CB9D1EC07@ESV4-MB02.linkedin.biz> Date: Sat, 7 Mar 2015 15:56:53 -0800 Message-ID: Subject: Re: RoutingTableProvider dropping callbacks From: Varun Sharma To: user@helix.apache.org Content-Type: multipart/alternative; boundary=047d7bdc12b0284c950510bb8ea9 X-Virus-Checked: Checked by ClamAV on apache.org --047d7bdc12b0284c950510bb8ea9 Content-Type: text/plain; charset=UTF-8 Here is the stack trace - there is a zookeeper race and the detailed stack trace appears for bucketized resources. I saw that the ideal state for the resource was created on 26th Feb and was modified on 7th March. However, the external view for the resource is showing up as created on 7th march as well as modified on 7th march. The external view is created at 10:36:04 on 7th march which is 20 seconds after this log message stack trace is spit out. After this the routing table provider no longer receives any more zk callbacks. 2015-03-07 10:35:43,735 [main-EventThread] (ZkAsyncCallbacks.java:127) WARN org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCallbackHandler@3c8589f0, rc:NONODE, path: /main_a/EXTERNALVIEW/$terrapin$data$visual_seo_joins_staging$1422384697040 2015-03-07 10:35:43,736 [main-EventThread] (ZkAsyncCallbacks.java:127) WARN org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCallbackHandler@63230a9a, rc:NONODE, path: /main_a/EXTERNALVIEW/$terrapin$data$recommendation_p2p_exp_candset_1$1425671237739 2015-03-07 10:35:43,736 [main-EventThread] (ZkAsyncCallbacks.java:127) WARN org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCallbackHandler@118d374f, rc:NONODE, path: /main_a/EXTERNALVIEW/$terrapin$data$None$1422308641250 2015-03-07 10:35:43,736 [ZkClient-EventThread-17-terrapinzk001a:2181] (CallbackHandler.java:304) WARN fail to subscribe child/data change. path: /main_a/EXTERNALVIEW, listener: com.pinterest.terrapin.controller.TerrapinRoutingTableProvider@2c6691da *org.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /main_a/EXTERNALVIEW/$terrapin$data$None$1422308641250* at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47) at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685) at org.apache.helix.manager.zk.ZkClient.getChildren(ZkClient.java:210) at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:409) at org.apache.helix.manager.zk.CallbackHandler.subscribeForChanges(CallbackHandler.java:279) at org.apache.helix.manager.zk.CallbackHandler.invoke(CallbackHandler.java:202) at org.apache.helix.manager.zk.CallbackHandler.handleChildChange(CallbackHandler.java:391) at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:570) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /main_a/EXTERNALVIEW/$terrapin$data$None$1422308641250 at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1249) 2015-03-07 10:35:43,848 [ZkClient-EventThread-17-terrapinzk001a:2181] (RoutingTableProvider.java:99) INFO *Resetting* the routing table. On Thu, Mar 5, 2015 at 11:33 AM, Varun Sharma wrote: > I suspect the callbacks are not coming in, for a long time now. > > On Thu, Mar 5, 2015 at 11:30 AM, Varun Sharma wrote: > >> I grepped this and found nothing: >> >> sudo grep START:INVOKE.*EXTERNALVIEW /var/log/terrapin/controller.log* >> >> I found a bunch of START:INVOKE for the IDEALSTATES znode though. >> >> On Thu, Mar 5, 2015 at 11:15 AM, Zhen Zhang wrote: >> >>> Yes. you should see a pair of "START:INVOKE..." and "END:INVOKE:..." >>> for each callback in your log. >>> ------------------------------ >>> *From:* Varun Sharma [varun@pinterest.com] >>> *Sent:* Thursday, March 05, 2015 11:11 AM >>> *To:* user@helix.apache.org >>> *Subject:* Re: RoutingTableProvider dropping callbacks >>> >>> Ohk - is there a way to confirm that the callbacks are being >>> processed (from the logs etc.) ? >>> >>> On Thu, Mar 5, 2015 at 10:50 AM, Zhen Zhang wrote: >>> >>>> Hi Varun, >>>> >>>> This should not be a problem. When we register a callback, we are >>>> expecting a call back type of INIT first, followed by a sequence of >>>> CALLBACK types, and when you unregister the callback, you will received a >>>> FINALIZED type. Since unregister is an async operation, when you receive a >>>> FINALIZED type, you might still see a couple of CALLBACK type callbacks, >>>> which are simply ignored. The log is basically telling you that. >>>> >>>> Thanks, >>>> Jason >>>> ------------------------------ >>>> *From:* Varun Sharma [varun@pinterest.com] >>>> *Sent:* Thursday, March 05, 2015 10:44 AM >>>> *To:* user@helix.apache.org >>>> *Subject:* RoutingTableProvider dropping callbacks >>>> >>>> Hi, >>>> >>>> It seems that the RoutingTableProvider is dropping callbacks in our >>>> case. Here is a log: >>>> >>>> [ZkClient-EventThread-17-terrapinzk001a:2181] >>>> (CallbackHandler.java:130) WARN Skip processing callbacks for listener: >>>> com.pinterest.terrapin.controller.TerrapinRoutingTableProvider@7e7f8062, >>>> path: /main_a/EXTERNALVIEW, expected types: [INIT] but was CALLBACK >>>> >>>> >>>> We have a custom RoutingTableProvider to catch callbacks and do some >>>> processing - this is causing a lot of issues for us. What could be causing >>>> this ? >>>> >>>> Thanks >>>> Varun >>>> >>> >>> >> > --047d7bdc12b0284c950510bb8ea9 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Here is the stack trace - there is a zookeeper race and th= e detailed stack trace appears for bucketized resources. I saw that the ide= al state for the resource was created on 26th Feb and was modified on 7th M= arch. However, the external view for the resource is showing up as created = on 7th march as well as modified on 7th march. The external view is created= at 10:36:04 on 7th march which is 20 seconds after this log message stack = trace is spit out. After this the routing table provider no longer receives= any more zk callbacks.

2015-03-07 10:35:43,735 [main-EventThread] (ZkAsyncCallbacks.= java:127) WARN=C2=A0 org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCa= llbackHandler@3c8589f0, rc:NONODE, path: /main_a/EXTERNALVIEW/$terrapin$dat= a$visual_seo_joins_staging$1422384697040

2015-03-07 10:35:43,736 [main-EventThread] (ZkAsyncCallbacks.= java:127) WARN=C2=A0 org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCa= llbackHandler@63230a9a, rc:NONODE, path: /main_a/EXTERNALVIEW/$terrapin$dat= a$recommendation_p2p_exp_candset_1$1425671237739

2015-03-07 10:35:43,736 [main-EventThread] (ZkAsyncCallbacks.= java:127) WARN=C2=A0 org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCa= llbackHandler@118d374f, rc:NONODE, path: /main_a/EXTERNALVIEW/$terrapin$dat= a$None$1422308641250

2015-03-07 10:35:43,736 [ZkClient-EventThread-17-terrapinzk00= 1a:2181] (CallbackHandler.java:304) WARN=C2=A0 fail to subscribe child/data= change. path: /main_a/EXTERNALVIEW, listener: com.pinterest.terrapin.contr= oller.TerrapinRoutingTableProvider@2c6691da

org.I0Itec.zkclient.exception.ZkNoNodeException: org.apach= e.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =3D NoNode for= /main_a/EXTERNALVIEW/$terrapin$data$None$1422308641250

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.I0Itec.zkclient.exception.= ZkException.create(ZkException.java:47)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.I0Itec.zkclient.ZkClient.r= etryUntilConnected(ZkClient.java:685)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.helix.manager.zk.Zk= Client.getChildren(ZkClient.java:210)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.I0Itec.zkclient.ZkClient.g= etChildren(ZkClient.java:409)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.helix.manager.zk.Ca= llbackHandler.subscribeForChanges(CallbackHandler.java:279)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.helix.manager.zk.Ca= llbackHandler.invoke(CallbackHandler.java:202)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.helix.manager.zk.Ca= llbackHandler.handleChildChange(CallbackHandler.java:391)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.I0Itec.zkclient.ZkClient$7= .run(ZkClient.java:570)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.I0Itec.zkclient.ZkEventThr= ead.run(ZkEventThread.java:71)

Caused by: org.apache.zookeeper.KeeperException$NoNodeExcepti= on: KeeperErrorCode =3D NoNode for /main_a/EXTERNALVIEW/$terrapin$data$None= $1422308641250

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.zookeeper.KeeperExc= eption.create(KeeperException.java:102)=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.a= pache.zookeeper.KeeperException.create(KeeperException.java:42)

=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.zookeeper.ZooKeeper= .getChildren(ZooKeeper.java:1249)

2015-03-07 10:35:43,848 [ZkClient-EventThread-17-terrapin= zk001a:2181] (RoutingTableProvider.java:99) INFO=C2=A0 Resetting the= routing table.


On Thu, Mar 5, 2015 at 11:33 AM, Varun Sharma <varun@pinte= rest.com> wrote:
I suspect the callbacks are not coming in, for a long time now.

On Thu, Mar 5, 2015 at 11:30 AM, Varun Sharma <varun@pinterest.com> wrote:

On Thu, Mar 5, 2015 at 11:15 AM, Zhen Zhang <zzhang@linkedin.com= > wrote:
Yes. you should see a pair of "START:= INVOKE..." and "END:INVOKE:..." for each callback in your lo= g.

Fro= m: Varun Sharma [varun@pinterest.com]
Sent: Thursday, March 05, 2015 11:11 AM
To: user@= helix.apache.org
Subject: Re: RoutingTableProvider dropping callbacks

Ohk - is there a way to confirm that the callbacks are bei= ng processed (from the logs etc.) ?

On Thu, Mar 5, 2015 at 10:50 AM, Zhen Zhang <zzhang@linkedi= n.com> wrote:
Hi Varun,=C2=A0

This should not be a problem. When we register a callback, we are expe= cting a call back type of INIT first, followed by a sequence of CALLBACK ty= pes, and when you unregister the callback, you will received a FINALIZED ty= pe. Since unregister is an async operation, when you receive a FINALIZED type, you might still see a couple= of CALLBACK type callbacks, which are simply ignored. The log is basically= telling you that.

Thanks,
Jason

Fro= m: Varun Sharma [varun@pinterest.com]
Sent: Thursday, March 05, 2015 10:44 AM
To: user@= helix.apache.org
Subject: RoutingTableProvider dropping callbacks

Hi,

It seems that the RoutingTableProvider is dropping callbacks in our ca= se. Here is a log:

[ZkClient-EventThread-17-terrapinzk001a:2181] (CallbackHandler.java:130)= WARN=C2=A0 Skip processing callbacks for listener: com.pinterest.terrapin.= controller.TerrapinRoutingTableProvider@7e7f8062, path: /main_a/EXTERNALVIE= W, expected types: [INIT] but was CALLBACK


We have a custom RoutingTableProvider to catch callbacks and do some pro= cessing - this is causing a lot of issues for us. What =C2=A0could be causi= ng this ?


Thanks
Varun




--047d7bdc12b0284c950510bb8ea9--