helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <g.kish...@gmail.com>
Subject RE: RoutingTableProvider dropping callbacks
Date Mon, 09 Mar 2015 21:38:50 GMT
Hi Varun,

The change is already made. We will start working on the release.

Any volunteer to make the release?

Thanks
Kishore G
 On Mar 9, 2015 2:17 PM, "Zhen Zhang" <zzhang@linkedin.com> wrote:

>  Hi Varun,
>
>  Kishore already checked in a fix for that:
>
> https://git-wip-us.apache.org/repos/asf?p=helix.git;a=commit;h=99baacf7f19a09d972754902c50f1618fc8b804c
>
>  It's in 0.6.x branch.
>
>  Thanks,
> Jason
>
> ------------------------------
> *From:* Varun Sharma [varun@pinterest.com]
> *Sent:* Monday, March 09, 2015 2:11 PM
> *To:* user@helix.apache.org
> *Subject:* Re: RoutingTableProvider dropping callbacks
>
>   Just pinging this thread to check on the hot fix to not remove
> externalview znode and release for the same. Is there a JIRA tracking that ?
>
> On Sun, Mar 8, 2015 at 11:46 PM, Varun Sharma <varun@pinterest.com> wrote:
>
>> If I recall correctly from a previous thread, it seems like we don't even
>> support changing of bucket sizes for the same resource - so it seems we
>> should probably not be deleting the znode in this case ?
>>
>> On Sun, Mar 8, 2015 at 11:43 PM, Zhen Zhang <zzhang@linkedin.com> wrote:
>>
>>>  @Kishore, I think the remove is used in case bucket size is changed,
>>> so we can clean all the buckets for old size and set it using new size.
>>>
>>>  The issue seems like a race condition in setting bucketized external
>>> view and add watches on child paths. Will investigate more.
>>>
>>>  Thanks,
>>> Jason
>>>  ------------------------------
>>> *From:* Varun Sharma [varun@pinterest.com]
>>> *Sent:* Saturday, March 07, 2015 11:07 PM
>>>
>>> *To:* user@helix.apache.org
>>> *Subject:* Re: RoutingTableProvider dropping callbacks
>>>
>>>    Please find the attached log file with the above trace.
>>>
>>> On Sat, Mar 7, 2015 at 8:12 PM, kishore g <g.kishore@gmail.com> wrote:
>>>
>>>> Another thing is that the RoutingTable is logging this line "Resetting
>>>> the routing table.". Looks like this happens when we fail to set the
>>>> watcher.
>>>>
>>>>  thanks,
>>>> Kishore G
>>>>
>>>> On Sat, Mar 7, 2015 at 8:05 PM, kishore g <g.kishore@gmail.com> wrote:
>>>>
>>>>> Your explanation makes sense.
>>>>>
>>>>>
>>>>> https://github.com/apache/helix/blob/helix-0.6.4/helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixDataAccessor.java.
>>>>> For bucketized resource we see that path is deleted and set again. Jason,
>>>>> any idea why we are removing the path?
>>>>>
>>>>>    case EXTERNALVIEW:   if (value.getBucketSize() == 0) {   records.
>>>>> add(value.getRecord());   } else {   _baseDataAccessor.remove(path,
>>>>> options);
>>>>>
>>>>> On Sat, Mar 7, 2015 at 4:03 PM, Varun Sharma <varun@pinterest.com>
>>>>> wrote:
>>>>>
>>>>>> How does the writing of externalview work for bucketized resources
>>>>>> -is it possible that the top level znode for the resource is first
deleted
>>>>>> and then rewritten with the latest external view ?
>>>>>>
>>>>>> On Sat, Mar 7, 2015 at 3:56 PM, Varun Sharma <varun@pinterest.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Here is the stack trace - there is a zookeeper race and the detailed
>>>>>>> stack trace appears for bucketized resources. I saw that the
ideal state
>>>>>>> for the resource was created on 26th Feb and was modified on
7th March.
>>>>>>> However, the external view for the resource is showing up as
created on 7th
>>>>>>> march as well as modified on 7th march. The external view is
created at
>>>>>>> 10:36:04 on 7th march which is 20 seconds after this log message
stack
>>>>>>> trace is spit out. After this the routing table provider no longer
receives
>>>>>>> any more zk callbacks.
>>>>>>>
>>>>>>>  2015-03-07 10:35:43,735 [main-EventThread]
>>>>>>> (ZkAsyncCallbacks.java:127) WARN
>>>>>>> org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCallbackHandler@3c8589f0,
>>>>>>> rc:NONODE, path:
>>>>>>> /main_a/EXTERNALVIEW/$terrapin$data$visual_seo_joins_staging$1422384697040
>>>>>>>
>>>>>>> 2015-03-07 10:35:43,736 [main-EventThread]
>>>>>>> (ZkAsyncCallbacks.java:127) WARN
>>>>>>> org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCallbackHandler@63230a9a,
>>>>>>> rc:NONODE, path:
>>>>>>> /main_a/EXTERNALVIEW/$terrapin$data$recommendation_p2p_exp_candset_1$1425671237739
>>>>>>>
>>>>>>> 2015-03-07 10:35:43,736 [main-EventThread]
>>>>>>> (ZkAsyncCallbacks.java:127) WARN
>>>>>>> org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCallbackHandler@118d374f,
>>>>>>> rc:NONODE, path: /main_a/EXTERNALVIEW/$terrapin$data$None$1422308641250
>>>>>>>
>>>>>>> 2015-03-07 10:35:43,736
>>>>>>> [ZkClient-EventThread-17-terrapinzk001a:2181] (CallbackHandler.java:304)
>>>>>>> WARN  fail to subscribe child/data change. path: /main_a/EXTERNALVIEW,
>>>>>>> listener:
>>>>>>> com.pinterest.terrapin.controller.TerrapinRoutingTableProvider@2c6691da
>>>>>>>
>>>>>>> *org.I0Itec.zkclient.exception.ZkNoNodeException:
>>>>>>> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode
=
>>>>>>> NoNode for /main_a/EXTERNALVIEW/$terrapin$data$None$1422308641250*
>>>>>>>
>>>>>>>         at
>>>>>>> org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
>>>>>>>
>>>>>>>         at
>>>>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
>>>>>>>
>>>>>>>         at
>>>>>>> org.apache.helix.manager.zk.ZkClient.getChildren(ZkClient.java:210)
>>>>>>>
>>>>>>>         at
>>>>>>> org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:409)
>>>>>>>
>>>>>>>         at
>>>>>>> org.apache.helix.manager.zk.CallbackHandler.subscribeForChanges(CallbackHandler.java:279)
>>>>>>>
>>>>>>>         at
>>>>>>> org.apache.helix.manager.zk.CallbackHandler.invoke(CallbackHandler.java:202)
>>>>>>>
>>>>>>>         at
>>>>>>> org.apache.helix.manager.zk.CallbackHandler.handleChildChange(CallbackHandler.java:391)
>>>>>>>
>>>>>>>         at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:570)
>>>>>>>
>>>>>>>         at
>>>>>>> org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
>>>>>>>
>>>>>>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
>>>>>>> KeeperErrorCode = NoNode for
>>>>>>> /main_a/EXTERNALVIEW/$terrapin$data$None$1422308641250
>>>>>>>
>>>>>>>         at
>>>>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>>>>>>>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>>>>>>>
>>>>>>>         at
>>>>>>> org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1249)
>>>>>>>
>>>>>>>  2015-03-07 10:35:43,848
>>>>>>> [ZkClient-EventThread-17-terrapinzk001a:2181]
>>>>>>> (RoutingTableProvider.java:99) INFO  *Resetting* the routing
table.
>>>>>>>
>>>>>>> On Thu, Mar 5, 2015 at 11:33 AM, Varun Sharma <varun@pinterest.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I suspect the callbacks are not coming in, for a long time
now.
>>>>>>>>
>>>>>>>> On Thu, Mar 5, 2015 at 11:30 AM, Varun Sharma <varun@pinterest.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I grepped this and found nothing:
>>>>>>>>>
>>>>>>>>>  sudo grep START:INVOKE.*EXTERNALVIEW
>>>>>>>>> /var/log/terrapin/controller.log*
>>>>>>>>>
>>>>>>>>> I found a bunch of START:INVOKE for the IDEALSTATES znode
though.
>>>>>>>>>
>>>>>>>>> On Thu, Mar 5, 2015 at 11:15 AM, Zhen Zhang <zzhang@linkedin.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>  Yes. you should see a pair of "START:INVOKE..."
and
>>>>>>>>>> "END:INVOKE:..." for each callback in your log.
>>>>>>>>>> ------------------------------
>>>>>>>>>> *From:* Varun Sharma [varun@pinterest.com]
>>>>>>>>>> *Sent:* Thursday, March 05, 2015 11:11 AM
>>>>>>>>>> *To:* user@helix.apache.org
>>>>>>>>>> *Subject:* Re: RoutingTableProvider dropping callbacks
>>>>>>>>>>
>>>>>>>>>>    Ohk - is there a way to confirm that the callbacks
are being
>>>>>>>>>> processed (from the logs etc.) ?
>>>>>>>>>>
>>>>>>>>>> On Thu, Mar 5, 2015 at 10:50 AM, Zhen Zhang <zzhang@linkedin.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>>  Hi Varun,
>>>>>>>>>>>
>>>>>>>>>>>  This should not be a problem. When we register
a callback, we
>>>>>>>>>>> are expecting a call back type of INIT first,
followed by a sequence of
>>>>>>>>>>> CALLBACK types, and when you unregister the callback,
you will received a
>>>>>>>>>>> FINALIZED type. Since unregister is an async
operation, when you receive a
>>>>>>>>>>> FINALIZED type, you might still see a couple
of CALLBACK type callbacks,
>>>>>>>>>>> which are simply ignored. The log is basically
telling you that.
>>>>>>>>>>>
>>>>>>>>>>>  Thanks,
>>>>>>>>>>> Jason
>>>>>>>>>>>  ------------------------------
>>>>>>>>>>> *From:* Varun Sharma [varun@pinterest.com]
>>>>>>>>>>> *Sent:* Thursday, March 05, 2015 10:44 AM
>>>>>>>>>>> *To:* user@helix.apache.org
>>>>>>>>>>> *Subject:* RoutingTableProvider dropping callbacks
>>>>>>>>>>>
>>>>>>>>>>>    Hi,
>>>>>>>>>>>
>>>>>>>>>>>  It seems that the RoutingTableProvider is dropping
callbacks
>>>>>>>>>>> in our case. Here is a log:
>>>>>>>>>>>
>>>>>>>>>>>  [ZkClient-EventThread-17-terrapinzk001a:2181]
>>>>>>>>>>> (CallbackHandler.java:130) WARN  Skip processing
callbacks for listener:
>>>>>>>>>>> com.pinterest.terrapin.controller.TerrapinRoutingTableProvider@7e7f8062,
>>>>>>>>>>> path: /main_a/EXTERNALVIEW, expected types: [INIT]
but was CALLBACK
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  We have a custom RoutingTableProvider to catch
callbacks and
>>>>>>>>>>> do some processing - this is causing a lot of
issues for us. What  could be
>>>>>>>>>>> causing this ?
>>>>>>>>>>>
>>>>>>>>>>>  Thanks
>>>>>>>>>>> Varun
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message