helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject Re: RoutingTableProvider dropping callbacks
Date Sun, 08 Mar 2015 04:12:48 GMT
Another thing is that the RoutingTable is logging this line "Resetting the
routing table.". Looks like this happens when we fail to set the watcher.

thanks,
Kishore G

On Sat, Mar 7, 2015 at 8:05 PM, kishore g <g.kishore@gmail.com> wrote:

> Your explanation makes sense.
>
>
> https://github.com/apache/helix/blob/helix-0.6.4/helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixDataAccessor.java.
> For bucketized resource we see that path is deleted and set again. Jason,
> any idea why we are removing the path?
>
> case EXTERNALVIEW: if (value.getBucketSize() == 0) { records.add(value.
> getRecord()); } else { _baseDataAccessor.remove(path, options);
>
> On Sat, Mar 7, 2015 at 4:03 PM, Varun Sharma <varun@pinterest.com> wrote:
>
>> How does the writing of externalview work for bucketized resources -is it
>> possible that the top level znode for the resource is first deleted and
>> then rewritten with the latest external view ?
>>
>> On Sat, Mar 7, 2015 at 3:56 PM, Varun Sharma <varun@pinterest.com> wrote:
>>
>>> Here is the stack trace - there is a zookeeper race and the detailed
>>> stack trace appears for bucketized resources. I saw that the ideal state
>>> for the resource was created on 26th Feb and was modified on 7th March.
>>> However, the external view for the resource is showing up as created on 7th
>>> march as well as modified on 7th march. The external view is created at
>>> 10:36:04 on 7th march which is 20 seconds after this log message stack
>>> trace is spit out. After this the routing table provider no longer receives
>>> any more zk callbacks.
>>>
>>> 2015-03-07 10:35:43,735 [main-EventThread] (ZkAsyncCallbacks.java:127)
>>> WARN
>>> org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCallbackHandler@3c8589f0,
>>> rc:NONODE, path:
>>> /main_a/EXTERNALVIEW/$terrapin$data$visual_seo_joins_staging$1422384697040
>>>
>>> 2015-03-07 10:35:43,736 [main-EventThread] (ZkAsyncCallbacks.java:127)
>>> WARN
>>> org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCallbackHandler@63230a9a,
>>> rc:NONODE, path:
>>> /main_a/EXTERNALVIEW/$terrapin$data$recommendation_p2p_exp_candset_1$1425671237739
>>>
>>> 2015-03-07 10:35:43,736 [main-EventThread] (ZkAsyncCallbacks.java:127)
>>> WARN
>>> org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCallbackHandler@118d374f,
>>> rc:NONODE, path: /main_a/EXTERNALVIEW/$terrapin$data$None$1422308641250
>>>
>>> 2015-03-07 10:35:43,736 [ZkClient-EventThread-17-terrapinzk001a:2181]
>>> (CallbackHandler.java:304) WARN  fail to subscribe child/data change. path:
>>> /main_a/EXTERNALVIEW, listener:
>>> com.pinterest.terrapin.controller.TerrapinRoutingTableProvider@2c6691da
>>>
>>> *org.I0Itec.zkclient.exception.ZkNoNodeException:
>>> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
>>> NoNode for /main_a/EXTERNALVIEW/$terrapin$data$None$1422308641250*
>>>
>>>         at
>>> org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
>>>
>>>         at
>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
>>>
>>>         at
>>> org.apache.helix.manager.zk.ZkClient.getChildren(ZkClient.java:210)
>>>
>>>         at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:409)
>>>
>>>         at
>>> org.apache.helix.manager.zk.CallbackHandler.subscribeForChanges(CallbackHandler.java:279)
>>>
>>>         at
>>> org.apache.helix.manager.zk.CallbackHandler.invoke(CallbackHandler.java:202)
>>>
>>>         at
>>> org.apache.helix.manager.zk.CallbackHandler.handleChildChange(CallbackHandler.java:391)
>>>
>>>         at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:570)
>>>
>>>         at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
>>>
>>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
>>> KeeperErrorCode = NoNode for
>>> /main_a/EXTERNALVIEW/$terrapin$data$None$1422308641250
>>>
>>>         at
>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>>>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>>>
>>>         at
>>> org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1249)
>>>
>>>  2015-03-07 10:35:43,848 [ZkClient-EventThread-17-terrapinzk001a:2181]
>>> (RoutingTableProvider.java:99) INFO  *Resetting* the routing table.
>>>
>>> On Thu, Mar 5, 2015 at 11:33 AM, Varun Sharma <varun@pinterest.com>
>>> wrote:
>>>
>>>> I suspect the callbacks are not coming in, for a long time now.
>>>>
>>>> On Thu, Mar 5, 2015 at 11:30 AM, Varun Sharma <varun@pinterest.com>
>>>> wrote:
>>>>
>>>>> I grepped this and found nothing:
>>>>>
>>>>> sudo grep START:INVOKE.*EXTERNALVIEW /var/log/terrapin/controller.log*
>>>>>
>>>>> I found a bunch of START:INVOKE for the IDEALSTATES znode though.
>>>>>
>>>>> On Thu, Mar 5, 2015 at 11:15 AM, Zhen Zhang <zzhang@linkedin.com>
>>>>> wrote:
>>>>>
>>>>>>  Yes. you should see a pair of "START:INVOKE..." and
>>>>>> "END:INVOKE:..." for each callback in your log.
>>>>>> ------------------------------
>>>>>> *From:* Varun Sharma [varun@pinterest.com]
>>>>>> *Sent:* Thursday, March 05, 2015 11:11 AM
>>>>>> *To:* user@helix.apache.org
>>>>>> *Subject:* Re: RoutingTableProvider dropping callbacks
>>>>>>
>>>>>>   Ohk - is there a way to confirm that the callbacks are being
>>>>>> processed (from the logs etc.) ?
>>>>>>
>>>>>> On Thu, Mar 5, 2015 at 10:50 AM, Zhen Zhang <zzhang@linkedin.com>
>>>>>> wrote:
>>>>>>
>>>>>>>  Hi Varun,
>>>>>>>
>>>>>>>  This should not be a problem. When we register a callback, we
are
>>>>>>> expecting a call back type of INIT first, followed by a sequence
of
>>>>>>> CALLBACK types, and when you unregister the callback, you will
received a
>>>>>>> FINALIZED type. Since unregister is an async operation, when
you receive a
>>>>>>> FINALIZED type, you might still see a couple of CALLBACK type
callbacks,
>>>>>>> which are simply ignored. The log is basically telling you that.
>>>>>>>
>>>>>>>  Thanks,
>>>>>>> Jason
>>>>>>>  ------------------------------
>>>>>>> *From:* Varun Sharma [varun@pinterest.com]
>>>>>>> *Sent:* Thursday, March 05, 2015 10:44 AM
>>>>>>> *To:* user@helix.apache.org
>>>>>>> *Subject:* RoutingTableProvider dropping callbacks
>>>>>>>
>>>>>>>    Hi,
>>>>>>>
>>>>>>>  It seems that the RoutingTableProvider is dropping callbacks
in
>>>>>>> our case. Here is a log:
>>>>>>>
>>>>>>>  [ZkClient-EventThread-17-terrapinzk001a:2181]
>>>>>>> (CallbackHandler.java:130) WARN  Skip processing callbacks for
listener:
>>>>>>> com.pinterest.terrapin.controller.TerrapinRoutingTableProvider@7e7f8062,
>>>>>>> path: /main_a/EXTERNALVIEW, expected types: [INIT] but was CALLBACK
>>>>>>>
>>>>>>>
>>>>>>>  We have a custom RoutingTableProvider to catch callbacks and
do
>>>>>>> some processing - this is causing a lot of issues for us. What
 could be
>>>>>>> causing this ?
>>>>>>>
>>>>>>>  Thanks
>>>>>>> Varun
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message