Return-Path: X-Original-To: apmail-helix-commits-archive@minotaur.apache.org Delivered-To: apmail-helix-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D79CF1080F for ; Sun, 3 May 2015 17:25:32 +0000 (UTC) Received: (qmail 89568 invoked by uid 500); 3 May 2015 17:25:32 -0000 Delivered-To: apmail-helix-commits-archive@helix.apache.org Received: (qmail 89533 invoked by uid 500); 3 May 2015 17:25:32 -0000 Mailing-List: contact commits-help@helix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@helix.apache.org Delivered-To: mailing list commits@helix.apache.org Received: (qmail 89524 invoked by uid 99); 3 May 2015 17:25:32 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 03 May 2015 17:25:32 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [54.164.171.186] (HELO mx1-us-east.apache.org) (54.164.171.186) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 03 May 2015 17:25:27 +0000 Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with SMTP id 93368428ED for ; Sun, 3 May 2015 17:25:06 +0000 (UTC) Received: (qmail 89454 invoked by uid 99); 3 May 2015 17:25:06 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 03 May 2015 17:25:06 +0000 Date: Sun, 3 May 2015 17:25:06 +0000 (UTC) From: "Vinoth Chandar (JIRA)" To: commits@helix.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HELIX-594) Misleading NPE trying to reconnect, upon ZK Timeout MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HELIX-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HELIX-594: --------------------------------- Description: this is a safety feature where Helix automatically detects GC and disconnects from the cluster automatically. Unfortunately in some cases it surfaces as NPE. We should probably describe the reason for disabling in the instance config. Currently we just disable the node, we should probably add an attribute DISABLE_CAUSE:"TOO MANY DISCONNECTS FROM ZK. CHECK JAVA GC LOG" or something like that. was: We always get the following errors on startup.. (#1 looks like the leader elector for controller... ) . Ours is a FULL_AUTO embedded controller helix configuration. 1.org.apache.helix.manager.zk.ZkBaseDataAccessor.doCreate(ZkBaseDataAccessor.java:138) Node already exists. path: /streamio/STATEMODELDEFS/STORAGE_DEFAULT_SM_SCHEMATA 2. org.apache.helix.manager.zk.CallbackHandler.invoke(CallbackHandler.java:130) Skip processing callbacks for listener: org.apache.helix.messaging.handling.HelixTaskExecutor@1a9f9f09, path: /streamio/INSTANCES/datapipe11-sjc1-controller-/MESSAGES, expected types: [CALLBACK, FINALIZE] but was INIT 3.org.apache.helix.healthcheck.ParticipantHealthReportTask.stop(ParticipantHealthReportTask.java:67) ParticipantHealthReportTimerTask already stopped org.apache.helix.healthcheck.ParticipantHealthReportTask in stop at line 67 > Misleading NPE trying to reconnect, upon ZK Timeout > --------------------------------------------------- > > Key: HELIX-594 > URL: https://issues.apache.org/jira/browse/HELIX-594 > Project: Apache Helix > Issue Type: Improvement > Components: helix-core > Affects Versions: 0.6.5 > Reporter: Vinoth Chandar > Priority: Minor > Fix For: master > > > this is a safety feature where Helix automatically detects GC and disconnects from the cluster automatically. Unfortunately in some cases it surfaces as NPE. > We should probably describe the reason for disabling in the instance config. Currently we just disable the node, we should probably add an attribute DISABLE_CAUSE:"TOO MANY DISCONNECTS FROM ZK. CHECK JAVA GC LOG" or something like that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)