Mailing-List: contact commits-help@helix.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@helix.apache.org
Received-SPF: pass (athena.apache.org: local policy)
Date: Sun, 3 May 2015 17:25:06 +0000 (UTC)
From: "Vinoth Chandar (JIRA)" <jira@apache.org>
To: commits@helix.incubator.apache.org
Message-ID: <JIRA.12826806.1430673858000.65395.1430673906103@Atlassian.JIRA>
In-Reply-To: <JIRA.12826806.1430673858000@Atlassian.JIRA>
References: <JIRA.12826806.1430673858000@Atlassian.JIRA>
 <JIRA.12826806.1430673858487@arcas>
Subject: [jira] [Updated] (HELIX-594) Misleading NPE trying to reconnect,
 upon ZK Timeout
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/HELIX-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinoth Chandar updated HELIX-594:
---------------------------------
    Description: 
 this is a safety feature where Helix automatically detects GC and disconnects from the cluster automatically. Unfortunately in some cases it surfaces as NPE. 

We should probably describe the reason for disabling in the instance config. Currently we just disable the node, we should probably add an attribute DISABLE_CAUSE:"TOO MANY DISCONNECTS FROM ZK. CHECK JAVA GC LOG" or something like that.


  was:
We always get the following errors on startup.. (#1 looks like the leader elector for controller... ) . Ours is a FULL_AUTO embedded controller helix configuration.

1.org.apache.helix.manager.zk.ZkBaseDataAccessor.doCreate(ZkBaseDataAccessor.java:138)
Node already exists. path: /streamio/STATEMODELDEFS/STORAGE_DEFAULT_SM_SCHEMATA


2. org.apache.helix.manager.zk.CallbackHandler.invoke(CallbackHandler.java:130) 
Skip processing callbacks for listener: org.apache.helix.messaging.handling.HelixTaskExecutor@1a9f9f09, path: /streamio/INSTANCES/datapipe11-sjc1-controller-/MESSAGES, expected types: [CALLBACK, FINALIZE] but was INIT


3.org.apache.helix.healthcheck.ParticipantHealthReportTask.stop(ParticipantHealthReportTask.java:67)
ParticipantHealthReportTimerTask already stopped
org.apache.helix.healthcheck.ParticipantHealthReportTask in stop at line 67


> Misleading NPE trying to reconnect, upon ZK Timeout
> ---------------------------------------------------
>
>                 Key: HELIX-594
>                 URL: https://issues.apache.org/jira/browse/HELIX-594
>             Project: Apache Helix
>          Issue Type: Improvement
>          Components: helix-core
>    Affects Versions: 0.6.5
>            Reporter: Vinoth Chandar
>            Priority: Minor
>             Fix For: master
>
>
>  this is a safety feature where Helix automatically detects GC and disconnects from the cluster automatically. Unfortunately in some cases it surfaces as NPE. 
> We should probably describe the reason for disabling in the instance config. Currently we just disable the node, we should probably add an attribute DISABLE_CAUSE:"TOO MANY DISCONNECTS FROM ZK. CHECK JAVA GC LOG" or something like that.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)