helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Varun Sharma <va...@pinterest.com>
Subject Re: RoutingTableProvider dropping callbacks
Date Mon, 09 Mar 2015 21:11:09 GMT
Just pinging this thread to check on the hot fix to not remove externalview
znode and release for the same. Is there a JIRA tracking that ?

On Sun, Mar 8, 2015 at 11:46 PM, Varun Sharma <varun@pinterest.com> wrote:

> If I recall correctly from a previous thread, it seems like we don't even
> support changing of bucket sizes for the same resource - so it seems we
> should probably not be deleting the znode in this case ?
>
> On Sun, Mar 8, 2015 at 11:43 PM, Zhen Zhang <zzhang@linkedin.com> wrote:
>
>>  @Kishore, I think the remove is used in case bucket size is changed, so
>> we can clean all the buckets for old size and set it using new size.
>>
>>  The issue seems like a race condition in setting bucketized external
>> view and add watches on child paths. Will investigate more.
>>
>>  Thanks,
>> Jason
>>  ------------------------------
>> *From:* Varun Sharma [varun@pinterest.com]
>> *Sent:* Saturday, March 07, 2015 11:07 PM
>>
>> *To:* user@helix.apache.org
>> *Subject:* Re: RoutingTableProvider dropping callbacks
>>
>>   Please find the attached log file with the above trace.
>>
>> On Sat, Mar 7, 2015 at 8:12 PM, kishore g <g.kishore@gmail.com> wrote:
>>
>>> Another thing is that the RoutingTable is logging this line "Resetting
>>> the routing table.". Looks like this happens when we fail to set the
>>> watcher.
>>>
>>>  thanks,
>>> Kishore G
>>>
>>> On Sat, Mar 7, 2015 at 8:05 PM, kishore g <g.kishore@gmail.com> wrote:
>>>
>>>> Your explanation makes sense.
>>>>
>>>>
>>>> https://github.com/apache/helix/blob/helix-0.6.4/helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixDataAccessor.java.
>>>> For bucketized resource we see that path is deleted and set again. Jason,
>>>> any idea why we are removing the path?
>>>>
>>>>    case EXTERNALVIEW:   if (value.getBucketSize() == 0) {   records.
>>>> add(value.getRecord());   } else {   _baseDataAccessor.remove(path,
>>>> options);
>>>>
>>>> On Sat, Mar 7, 2015 at 4:03 PM, Varun Sharma <varun@pinterest.com>
>>>> wrote:
>>>>
>>>>> How does the writing of externalview work for bucketized resources -is
>>>>> it possible that the top level znode for the resource is first deleted
and
>>>>> then rewritten with the latest external view ?
>>>>>
>>>>> On Sat, Mar 7, 2015 at 3:56 PM, Varun Sharma <varun@pinterest.com>
>>>>> wrote:
>>>>>
>>>>>> Here is the stack trace - there is a zookeeper race and the detailed
>>>>>> stack trace appears for bucketized resources. I saw that the ideal
state
>>>>>> for the resource was created on 26th Feb and was modified on 7th
March.
>>>>>> However, the external view for the resource is showing up as created
on 7th
>>>>>> march as well as modified on 7th march. The external view is created
at
>>>>>> 10:36:04 on 7th march which is 20 seconds after this log message
stack
>>>>>> trace is spit out. After this the routing table provider no longer
receives
>>>>>> any more zk callbacks.
>>>>>>
>>>>>>  2015-03-07 10:35:43,735 [main-EventThread]
>>>>>> (ZkAsyncCallbacks.java:127) WARN
>>>>>> org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCallbackHandler@3c8589f0,
>>>>>> rc:NONODE, path:
>>>>>> /main_a/EXTERNALVIEW/$terrapin$data$visual_seo_joins_staging$1422384697040
>>>>>>
>>>>>> 2015-03-07 10:35:43,736 [main-EventThread]
>>>>>> (ZkAsyncCallbacks.java:127) WARN
>>>>>> org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCallbackHandler@63230a9a,
>>>>>> rc:NONODE, path:
>>>>>> /main_a/EXTERNALVIEW/$terrapin$data$recommendation_p2p_exp_candset_1$1425671237739
>>>>>>
>>>>>> 2015-03-07 10:35:43,736 [main-EventThread]
>>>>>> (ZkAsyncCallbacks.java:127) WARN
>>>>>> org.apache.helix.manager.zk.ZkAsyncCallbacks$SetDataCallbackHandler@118d374f,
>>>>>> rc:NONODE, path: /main_a/EXTERNALVIEW/$terrapin$data$None$1422308641250
>>>>>>
>>>>>> 2015-03-07 10:35:43,736 [ZkClient-EventThread-17-terrapinzk001a:2181]
>>>>>> (CallbackHandler.java:304) WARN  fail to subscribe child/data change.
path:
>>>>>> /main_a/EXTERNALVIEW, listener:
>>>>>> com.pinterest.terrapin.controller.TerrapinRoutingTableProvider@2c6691da
>>>>>>
>>>>>> *org.I0Itec.zkclient.exception.ZkNoNodeException:
>>>>>> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode
=
>>>>>> NoNode for /main_a/EXTERNALVIEW/$terrapin$data$None$1422308641250*
>>>>>>
>>>>>>         at
>>>>>> org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
>>>>>>
>>>>>>         at
>>>>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
>>>>>>
>>>>>>         at
>>>>>> org.apache.helix.manager.zk.ZkClient.getChildren(ZkClient.java:210)
>>>>>>
>>>>>>         at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:409)
>>>>>>
>>>>>>         at
>>>>>> org.apache.helix.manager.zk.CallbackHandler.subscribeForChanges(CallbackHandler.java:279)
>>>>>>
>>>>>>         at
>>>>>> org.apache.helix.manager.zk.CallbackHandler.invoke(CallbackHandler.java:202)
>>>>>>
>>>>>>         at
>>>>>> org.apache.helix.manager.zk.CallbackHandler.handleChildChange(CallbackHandler.java:391)
>>>>>>
>>>>>>         at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:570)
>>>>>>
>>>>>>         at
>>>>>> org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
>>>>>>
>>>>>> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
>>>>>> KeeperErrorCode = NoNode for
>>>>>> /main_a/EXTERNALVIEW/$terrapin$data$None$1422308641250
>>>>>>
>>>>>>         at
>>>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>>>>>>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>>>>>>
>>>>>>         at
>>>>>> org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1249)
>>>>>>
>>>>>>  2015-03-07 10:35:43,848
>>>>>> [ZkClient-EventThread-17-terrapinzk001a:2181]
>>>>>> (RoutingTableProvider.java:99) INFO  *Resetting* the routing table.
>>>>>>
>>>>>> On Thu, Mar 5, 2015 at 11:33 AM, Varun Sharma <varun@pinterest.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I suspect the callbacks are not coming in, for a long time now.
>>>>>>>
>>>>>>> On Thu, Mar 5, 2015 at 11:30 AM, Varun Sharma <varun@pinterest.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I grepped this and found nothing:
>>>>>>>>
>>>>>>>>  sudo grep START:INVOKE.*EXTERNALVIEW
>>>>>>>> /var/log/terrapin/controller.log*
>>>>>>>>
>>>>>>>> I found a bunch of START:INVOKE for the IDEALSTATES znode
though.
>>>>>>>>
>>>>>>>> On Thu, Mar 5, 2015 at 11:15 AM, Zhen Zhang <zzhang@linkedin.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>  Yes. you should see a pair of "START:INVOKE..." and
>>>>>>>>> "END:INVOKE:..." for each callback in your log.
>>>>>>>>> ------------------------------
>>>>>>>>> *From:* Varun Sharma [varun@pinterest.com]
>>>>>>>>> *Sent:* Thursday, March 05, 2015 11:11 AM
>>>>>>>>> *To:* user@helix.apache.org
>>>>>>>>> *Subject:* Re: RoutingTableProvider dropping callbacks
>>>>>>>>>
>>>>>>>>>    Ohk - is there a way to confirm that the callbacks
are being
>>>>>>>>> processed (from the logs etc.) ?
>>>>>>>>>
>>>>>>>>> On Thu, Mar 5, 2015 at 10:50 AM, Zhen Zhang <zzhang@linkedin.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>  Hi Varun,
>>>>>>>>>>
>>>>>>>>>>  This should not be a problem. When we register a
callback, we
>>>>>>>>>> are expecting a call back type of INIT first, followed
by a sequence of
>>>>>>>>>> CALLBACK types, and when you unregister the callback,
you will received a
>>>>>>>>>> FINALIZED type. Since unregister is an async operation,
when you receive a
>>>>>>>>>> FINALIZED type, you might still see a couple of CALLBACK
type callbacks,
>>>>>>>>>> which are simply ignored. The log is basically telling
you that.
>>>>>>>>>>
>>>>>>>>>>  Thanks,
>>>>>>>>>> Jason
>>>>>>>>>>  ------------------------------
>>>>>>>>>> *From:* Varun Sharma [varun@pinterest.com]
>>>>>>>>>> *Sent:* Thursday, March 05, 2015 10:44 AM
>>>>>>>>>> *To:* user@helix.apache.org
>>>>>>>>>> *Subject:* RoutingTableProvider dropping callbacks
>>>>>>>>>>
>>>>>>>>>>    Hi,
>>>>>>>>>>
>>>>>>>>>>  It seems that the RoutingTableProvider is dropping
callbacks in
>>>>>>>>>> our case. Here is a log:
>>>>>>>>>>
>>>>>>>>>>  [ZkClient-EventThread-17-terrapinzk001a:2181]
>>>>>>>>>> (CallbackHandler.java:130) WARN  Skip processing
callbacks for listener:
>>>>>>>>>> com.pinterest.terrapin.controller.TerrapinRoutingTableProvider@7e7f8062,
>>>>>>>>>> path: /main_a/EXTERNALVIEW, expected types: [INIT]
but was CALLBACK
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  We have a custom RoutingTableProvider to catch callbacks
and do
>>>>>>>>>> some processing - this is causing a lot of issues
for us. What  could be
>>>>>>>>>> causing this ?
>>>>>>>>>>
>>>>>>>>>>  Thanks
>>>>>>>>>> Varun
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message