helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinoth Chandar <vin...@uber.com>
Subject Re: NPE trying to reconnect, upon ZK Timeout
Date Thu, 30 Apr 2015 20:18:24 GMT
yep .. Seeing this

$ grep -i flap /var/log/streamio/streamio.log
2015-04-30 16:08:50,823 ERROR - ZKHelixManager             - instanceName:
??--checkpointer is flapping. disconnect it.  maxDisconnectThreshold: 5
disconnects in 300000ms.
2015-04-30 16:09:30,140 ERROR - ZKHelixManager             - instanceName:
??-controller- is flapping. disconnect it.  maxDisconnectThreshold: 5
disconnects in 300000ms.
2015-04-30 16:11:05,679 ERROR - ZKHelixManager             - instanceName:
??-controller- is flapping. disconnect it.  maxDisconnectThreshold: 5
disconnects in 300000ms.

and confirmed its GCing from the logs. (Sorry, had a bad dashboard
originally that did not catch this)

Thanks
Vinoth

On Thu, Apr 30, 2015 at 12:12 PM, Zhen Zhang <nehzgnahz@gmail.com> wrote:

> Hi Vinoth,
>
> The NPE indicates the zookeeper connection in ZkClient is NULL. The
> connection becomes NULL only when HelixManager#disconnect() is called. This
> may happen if you directly call HelixManager#disconnect() or there are
> frequent GC's and HelixManager disconnects itself. You may grep
> "KeeperState" to figure out the connection state changes.
>
> Thanks,
> Jason
>
>
> On Thu, Apr 30, 2015 at 11:53 AM, Vinoth Chandar <vinoth@uber.com> wrote:
>
>> Hi guys,
>>
>> I am hitting the following with 0.6.5, upon a ZK connection timeout . We
>> make this call to the PropertyStore to figure out an offset to resume from.
>> This error eventually puts every partition into an error state and comes to
>> a grinding halt.  Any pointers to troubleshoot this? Nonetheless, there
>> should nt be an NPE right?
>>
>> NullPointerException
>>
>>    -
>>
>>    org.apache.helix.manager.zk.ZkClient$4 in call at line 241
>>    -
>>
>>    org.apache.helix.manager.zk.ZkClient$4 in call at line 237
>>    -
>>
>>    org.I0Itec.zkclient.ZkClient in retryUntilConnected at line 675
>>    -
>>
>>    org.apache.helix.manager.zk.ZkClient in readData at line 237
>>    -
>>
>>    org.I0Itec.zkclient.ZkClient in readData at line 761
>>    -
>>
>>    org.apache.helix.manager.zk.ZkBaseDataAccessor in get at line 308
>>    -
>>
>>    org.apache.helix.manager.zk.ZkCacheBaseDataAccessor in get at line 377
>>    -
>>
>>    org.apache.helix.store.zk.AutoFallbackPropertyStore in get at line 100
>>
>>
>>
>> Thanks
>> Vinoth
>>
>
>

Mime
View raw message