Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 3F445200BCE for ; Fri, 18 Nov 2016 00:22:27 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 3DD40160B0F; Thu, 17 Nov 2016 23:22:27 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 5F133160B0B for ; Fri, 18 Nov 2016 00:22:26 +0100 (CET) Received: (qmail 59351 invoked by uid 500); 17 Nov 2016 23:22:25 -0000 Mailing-List: contact issues-help@geode.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@geode.apache.org Delivered-To: mailing list issues@geode.apache.org Received: (qmail 59341 invoked by uid 99); 17 Nov 2016 23:22:25 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Nov 2016 23:22:25 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 1C9DFC0FD4 for ; Thu, 17 Nov 2016 23:22:25 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -6.219 X-Spam-Level: X-Spam-Status: No, score=-6.219 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id ivoC5El7s7bA for ; Thu, 17 Nov 2016 23:22:23 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with SMTP id 644C65F295 for ; Thu, 17 Nov 2016 23:22:23 +0000 (UTC) Received: (qmail 57976 invoked by uid 99); 17 Nov 2016 23:21:59 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Nov 2016 23:21:59 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 95C562C4C72 for ; Thu, 17 Nov 2016 23:21:59 +0000 (UTC) Date: Thu, 17 Nov 2016 23:21:59 +0000 (UTC) From: "Jared Stewart (JIRA)" To: issues@geode.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (GEODE-2125) GFSH should provide information about Locators that go into reconnect mode MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 17 Nov 2016 23:22:27 -0000 [ https://issues.apache.org/jira/browse/GEODE-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jared Stewart updated GEODE-2125: --------------------------------- Description: If the Locator is started from GFSH and the cluster's only server is killed, network partition detection will initiate forceDisconnect in the Locator and leave it in reconnect mode. To the User it will appear that the Locator crashed and GFSH lost connection: {noformat} gfsh> No longer connected to 192.168.1.72[1099]. {noformat} During the time in which the Locator is in reconnect mode, the User cannot connect via GFSH, nor can they issue status or stop commands against it: {noformat} $ cd locator1 $ cat vf.gf.locator.pid 33959 $ ps 33959 PID TT STAT TIME COMMAND 33959 s001 S 0:19.97 /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Co $ gfsh _________________________ __ / _____/ ______/ ______/ /____/ / / / __/ /___ /_____ / _____ / / /__/ / ____/ _____/ / / / / /______/_/ /______/_/ /_/ 1.1.0-incubating-SNAPSHOT Monitor and Manage Apache Geode (incubating) gfsh>connect --locator=localhost[10334] Connecting to Locator at [host=localhost, port=10334] .. Connection refused gfsh>status locator --pid=33959 null gfsh>status locator --dir=locator1 null gfsh>stop locator --dir=locator1 Locator in /Users/klund/dev/geode/locator1 on null is currently not responding. gfsh>stop locator --pid=33959 Locator in /Users/klund/dev/geode on null is currently not responding. {noformat} If a Locator has GFSH connected then it should notify GFSH that it is going to forceDisconnect and go into reconnect mode. Then GFSH can notify the User so the User is not suprised. In addition, GFSH status and stop commands should be modified to be able to talk to a Locator in reconnect mode. GFSH start could also be modified to report that the Locator is running in reconnect mode instead of reporting a hung process in the Locator's directory. Attachments: * The Locator log file is attached as locator_failure-logs.txt * The Locator thread dump (via jstack) AFTER it has shut down due to forceDisconnect is attached as thread_dump.txt was: If the Locator is started from GFSH and the cluster has one server which is killed, network partition detection will initiate forceDisconnect in the Locator and leave it in reconnect mode. To the User it will appear that the Locator crashed and GFSH lost connection: {noformat} gfsh> No longer connected to 192.168.1.72[1099]. {noformat} During the time in which the Locator is in reconnect mode, the User cannot connect via GFSH, nor can they issue status or stop commands against it: {noformat} $ cd locator1 $ cat vf.gf.locator.pid 33959 $ ps 33959 PID TT STAT TIME COMMAND 33959 s001 S 0:19.97 /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Co $ gfsh _________________________ __ / _____/ ______/ ______/ /____/ / / / __/ /___ /_____ / _____ / / /__/ / ____/ _____/ / / / / /______/_/ /______/_/ /_/ 1.1.0-incubating-SNAPSHOT Monitor and Manage Apache Geode (incubating) gfsh>connect --locator=localhost[10334] Connecting to Locator at [host=localhost, port=10334] .. Connection refused gfsh>status locator --pid=33959 null gfsh>status locator --dir=locator1 null gfsh>stop locator --dir=locator1 Locator in /Users/klund/dev/geode/locator1 on null is currently not responding. gfsh>stop locator --pid=33959 Locator in /Users/klund/dev/geode on null is currently not responding. {noformat} If a Locator has GFSH connected then it should notify GFSH that it is going to forceDisconnect and go into reconnect mode. Then GFSH can notify the User so the User is not suprised. In addition, GFSH status and stop commands should be modified to be able to talk to a Locator in reconnect mode. GFSH start could also be modified to report that the Locator is running in reconnect mode instead of reporting a hung process in the Locator's directory. Attachments: * The Locator log file is attached as locator_failure-logs.txt * The Locator thread dump (via jstack) AFTER it has shut down due to forceDisconnect is attached as thread_dump.txt > GFSH should provide information about Locators that go into reconnect mode > -------------------------------------------------------------------------- > > Key: GEODE-2125 > URL: https://issues.apache.org/jira/browse/GEODE-2125 > Project: Geode > Issue Type: Improvement > Components: management > Affects Versions: 1.0.0-incubating > Reporter: Kirk Lund > Assignee: Kirk Lund > Attachments: locator_failure-logs.txt, thread_dump.txt > > > If the Locator is started from GFSH and the cluster's only server is killed, network partition detection will initiate forceDisconnect in the Locator and leave it in reconnect mode. To the User it will appear that the Locator crashed and GFSH lost connection: > {noformat} > gfsh> > No longer connected to 192.168.1.72[1099]. > {noformat} > During the time in which the Locator is in reconnect mode, the User cannot connect via GFSH, nor can they issue status or stop commands against it: > {noformat} > $ cd locator1 > $ cat vf.gf.locator.pid > 33959 > $ ps 33959 > PID TT STAT TIME COMMAND > 33959 s001 S 0:19.97 /Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Co > $ gfsh > _________________________ __ > / _____/ ______/ ______/ /____/ / > / / __/ /___ /_____ / _____ / > / /__/ / ____/ _____/ / / / / > /______/_/ /______/_/ /_/ 1.1.0-incubating-SNAPSHOT > Monitor and Manage Apache Geode (incubating) > gfsh>connect --locator=localhost[10334] > Connecting to Locator at [host=localhost, port=10334] .. > Connection refused > gfsh>status locator --pid=33959 > null > gfsh>status locator --dir=locator1 > null > gfsh>stop locator --dir=locator1 > Locator in /Users/klund/dev/geode/locator1 on null is currently not responding. > gfsh>stop locator --pid=33959 > Locator in /Users/klund/dev/geode on null is currently not responding. > {noformat} > If a Locator has GFSH connected then it should notify GFSH that it is going to forceDisconnect and go into reconnect mode. Then GFSH can notify the User so the User is not suprised. > In addition, GFSH status and stop commands should be modified to be able to talk to a Locator in reconnect mode. GFSH start could also be modified to report that the Locator is running in reconnect mode instead of reporting a hung process in the Locator's directory. > Attachments: > * The Locator log file is attached as locator_failure-logs.txt > * The Locator thread dump (via jstack) AFTER it has shut down due to forceDisconnect is attached as thread_dump.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)