helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhen Zhang <zzh...@linkedin.com>
Subject RE: RoutingTableProvider dropping callbacks
Date Mon, 09 Mar 2015 06:43:23 GMT
@Kishore, I think the remove is used in case bucket size is changed, so we can clean all the
buckets for old size and set it using new size.

The issue seems like a race condition in setting bucketized external view and add watches
on child paths. Will investigate more.

Thanks,
Jason
________________________________
From: Varun Sharma [varun@pinterest.com]
Sent: Saturday, March 07, 2015 11:07 PM
To: user@helix.apache.org
Subject: Re: RoutingTableProvider dropping callbacks

Please find the attached log file with the above trace.

On Sat, Mar 7, 2015 at 8:12 PM, kishore g <g.kishore@gmail.com<mailto:g.kishore@gmail.com>>
wrote:
Another thing is that the RoutingTable is logging this line "Resetting the routing table.".
Looks like this happens when we fail to set the watcher.

thanks,
Kishore G

On Sat, Mar 7, 2015 at 8:05 PM, kishore g <g.kishore@gmail.com<mailto:g.kishore@gmail.com>>
wrote:
Your explanation makes sense.

https://github.com/apache/helix/blob/helix-0.6.4/helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixDataAccessor.java.
For bucketized resource we see that path is deleted and set again. Jason, any idea why we
are removing the path?

case EXTERNALVIEW:
        if (value.getBucketSize() == 0) {
        records.add(value.getRecord());
        } else {
        _baseDataAccessor.remove(path, options);

On Sat, Mar 7, 2015 at 4:03 PM, Varun Sharma <varun@pinterest.com<mailto:varun@pinterest.com>>
wrote:
How does the writing of externalview work for bucketized resources -is it possible that the
top level znode for the resource is first deleted and then rewritten with the latest external
view ?

On Sat, Mar 7, 2015 at 3:56 PM, Varun Sharma <varun@pinterest.com<mailto:varun@pinterest.com>>
wrote:
Here is the stack trace - there is a zookeeper race and the detailed stack trace appears for
bucketized resources. I saw that the ideal state for the resource was created on 26th Feb
and was modified on 7th March. However, the external view for the resource is showing up as
created on 7th march as well as modified on 7th march. The external view is created at 10:36:04
on 7th march which is 20 seconds after this log message stack trace is spit out. After this
the routing table provider no longer receives any more zk callbacks.


2015-03-07 10:35:43,735 [main-EventThread] (ZkAsyncCallbacks.java:127) WARN  org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCallbackHandler@3c8589f0,
rc:NONODE, path: /main_a/EXTERNALVIEW/$terrapin$data$visual_seo_joins_staging$1422384697040

2015-03-07 10:35:43,736 [main-EventThread] (ZkAsyncCallbacks.java:127) WARN  org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCallbackHandler@63230a9a,
rc:NONODE, path: /main_a/EXTERNALVIEW/$terrapin$data$recommendation_p2p_exp_candset_1$1425671237739

2015-03-07 10:35:43,736 [main-EventThread] (ZkAsyncCallbacks.java:127) WARN  org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCallbackHandler@118d374f,
rc:NONODE, path: /main_a/EXTERNALVIEW/$terrapin$data$None$1422308641250

2015-03-07 10:35:43,736 [ZkClient-EventThread-17-terrapinzk001a:2181] (CallbackHandler.java:304)
WARN  fail to subscribe child/data change. path: /main_a/EXTERNALVIEW, listener: com.pinterest.terrapin.controller.TerrapinRoutingTableProvider@2c6691da

org.I0Itec.zkclient.exception.ZkNoNodeException: org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for /main_a/EXTERNALVIEW/$terrapin$data$None$1422308641250

        at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)

        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)

        at org.apache.helix.manager.zk.ZkClient.getChildren(ZkClient.java:210)

        at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:409)

        at org.apache.helix.manager.zk.CallbackHandler.subscribeForChanges(CallbackHandler.java:279)

        at org.apache.helix.manager.zk.CallbackHandler.invoke(CallbackHandler.java:202)

        at org.apache.helix.manager.zk.CallbackHandler.handleChildChange(CallbackHandler.java:391)

        at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:570)

        at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)

Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
for /main_a/EXTERNALVIEW/$terrapin$data$None$1422308641250

        at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)        at
org.apache.zookeeper.KeeperException.create(KeeperException.java:42)

        at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1249)

2015-03-07 10:35:43,848 [ZkClient-EventThread-17-terrapinzk001a:2181] (RoutingTableProvider.java:99)
INFO  Resetting the routing table.

On Thu, Mar 5, 2015 at 11:33 AM, Varun Sharma <varun@pinterest.com<mailto:varun@pinterest.com>>
wrote:
I suspect the callbacks are not coming in, for a long time now.

On Thu, Mar 5, 2015 at 11:30 AM, Varun Sharma <varun@pinterest.com<mailto:varun@pinterest.com>>
wrote:
I grepped this and found nothing:


sudo grep START:INVOKE.*EXTERNALVIEW /var/log/terrapin/controller.log*

I found a bunch of START:INVOKE for the IDEALSTATES znode though.

On Thu, Mar 5, 2015 at 11:15 AM, Zhen Zhang <zzhang@linkedin.com<mailto:zzhang@linkedin.com>>
wrote:
Yes. you should see a pair of "START:INVOKE..." and "END:INVOKE:..." for each callback in
your log.
________________________________
From: Varun Sharma [varun@pinterest.com<mailto:varun@pinterest.com>]
Sent: Thursday, March 05, 2015 11:11 AM
To: user@helix.apache.org<mailto:user@helix.apache.org>
Subject: Re: RoutingTableProvider dropping callbacks

Ohk - is there a way to confirm that the callbacks are being processed (from the logs etc.)
?

On Thu, Mar 5, 2015 at 10:50 AM, Zhen Zhang <zzhang@linkedin.com<mailto:zzhang@linkedin.com>>
wrote:
Hi Varun,

This should not be a problem. When we register a callback, we are expecting a call back type
of INIT first, followed by a sequence of CALLBACK types, and when you unregister the callback,
you will received a FINALIZED type. Since unregister is an async operation, when you receive
a FINALIZED type, you might still see a couple of CALLBACK type callbacks, which are simply
ignored. The log is basically telling you that.

Thanks,
Jason
________________________________
From: Varun Sharma [varun@pinterest.com<mailto:varun@pinterest.com>]
Sent: Thursday, March 05, 2015 10:44 AM
To: user@helix.apache.org<mailto:user@helix.apache.org>
Subject: RoutingTableProvider dropping callbacks

Hi,

It seems that the RoutingTableProvider is dropping callbacks in our case. Here is a log:


[ZkClient-EventThread-17-terrapinzk001a:2181] (CallbackHandler.java:130) WARN  Skip processing
callbacks for listener: com.pinterest.terrapin.controller.TerrapinRoutingTableProvider@7e7f8062,
path: /main_a/EXTERNALVIEW, expected types: [INIT] but was CALLBACK


We have a custom RoutingTableProvider to catch callbacks and do some processing - this is
causing a lot of issues for us. What  could be causing this ?

Thanks
Varun









Mime
View raw message