helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhen Zhang <nehzgn...@gmail.com>
Subject Re: NPE trying to reconnect, upon ZK Timeout
Date Thu, 30 Apr 2015 19:12:48 GMT
Hi Vinoth,

The NPE indicates the zookeeper connection in ZkClient is NULL. The
connection becomes NULL only when HelixManager#disconnect() is called. This
may happen if you directly call HelixManager#disconnect() or there are
frequent GC's and HelixManager disconnects itself. You may grep
"KeeperState" to figure out the connection state changes.

Thanks,
Jason


On Thu, Apr 30, 2015 at 11:53 AM, Vinoth Chandar <vinoth@uber.com> wrote:

> Hi guys,
>
> I am hitting the following with 0.6.5, upon a ZK connection timeout . We
> make this call to the PropertyStore to figure out an offset to resume from.
> This error eventually puts every partition into an error state and comes to
> a grinding halt.  Any pointers to troubleshoot this? Nonetheless, there
> should nt be an NPE right?
>
> NullPointerException
>
>    -
>
>    org.apache.helix.manager.zk.ZkClient$4 in call at line 241
>    -
>
>    org.apache.helix.manager.zk.ZkClient$4 in call at line 237
>    -
>
>    org.I0Itec.zkclient.ZkClient in retryUntilConnected at line 675
>    -
>
>    org.apache.helix.manager.zk.ZkClient in readData at line 237
>    -
>
>    org.I0Itec.zkclient.ZkClient in readData at line 761
>    -
>
>    org.apache.helix.manager.zk.ZkBaseDataAccessor in get at line 308
>    -
>
>    org.apache.helix.manager.zk.ZkCacheBaseDataAccessor in get at line 377
>    -
>
>    org.apache.helix.store.zk.AutoFallbackPropertyStore in get at line 100
>
>
>
> Thanks
> Vinoth
>

Mime
View raw message