Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 25C7B184EB for ; Fri, 19 Feb 2016 16:54:19 +0000 (UTC) Received: (qmail 51884 invoked by uid 500); 19 Feb 2016 16:54:18 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 51841 invoked by uid 500); 19 Feb 2016 16:54:18 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 51816 invoked by uid 99); 19 Feb 2016 16:54:18 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Feb 2016 16:54:18 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 2A6772C1F58 for ; Fri, 19 Feb 2016 16:54:18 +0000 (UTC) Date: Fri, 19 Feb 2016 16:54:18 +0000 (UTC) From: "Sam Tunnicliffe (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CASSANDRA-11093) processs restarts are failing becase native port and jmx ports are in use MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-11093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-11093: ---------------------------------------- Assignee: (was: Sam Tunnicliffe) > processs restarts are failing becase native port and jmx ports are in use > ------------------------------------------------------------------------- > > Key: CASSANDRA-11093 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11093 > Project: Cassandra > Issue Type: Bug > Components: Configuration > Environment: PROD > Reporter: varun > Priority: Minor > Labels: lhf > > A process restart should automatically take care of this. But it is not and it is a problem. > The ports are are considered in use even if the process has quit/died/killed but the socket is in a TIME_WAIT state in the TCP FSM (http://tcpipguide.com/free/t_TCPOperationalOverviewandtheTCPFiniteStateMachineF-2.htm). > tcp 0 0 127.0.0.1:7199 0.0.0.0:* LISTEN 30099/java > tcp 0 0 192.168.1.2:9160 0.0.0.0:* LISTEN 30099/java > tcp 0 0 10.130.128.131:58263 10.130.128.131:9042 TIME_WAIT - > tcp 0 0 10.130.128.131:58262 10.130.128.131:9042 TIME_WAIT - > tcp 0 0 ::ffff:10.130.128.131:9042 :::* LISTEN 30099/java > tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.130.128.131:57191 ESTABLISHED 30099/java > tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.130.128.131:57190 ESTABLISHED 30099/java > tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.176.70.226:37105 ESTABLISHED 30099/java > tcp 0 0 ::ffff:127.0.0.1:42562 ::ffff:127.0.0.1:7199 TIME_WAIT - > tcp 0 0 ::ffff:10.130.128.131:57190 ::ffff:10.130.128.131:9042 ESTABLISHED 30138/java > tcp 0 0 ::ffff:10.130.128.131:57198 ::ffff:10.130.128.131:9042 ESTABLISHED 30138/java > tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.176.70.226:37106 ESTABLISHED 30099/java > tcp 0 0 ::ffff:10.130.128.131:57197 ::ffff:10.130.128.131:9042 ESTABLISHED 30138/java > tcp 0 0 ::ffff:10.130.128.131:57191 ::ffff:10.130.128.131:9042 ESTABLISHED 30138/java > tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.130.128.131:57198 ESTABLISHED 30099/java > tcp 0 0 ::ffff:10.130.128.131:9042 ::ffff:10.130.128.131:57197 ESTABLISHED 30099/java > tcp 0 0 ::ffff:127.0.0.1:42567 ::ffff:127.0.0.1:7199 TIME_WAIT - > I had to write a restart handler that does a netstat call and looks to make sure all the TIME_WAIT states exhaust before starting the node back up. This happened on 26 of the 56 when a rolling restart was performed. The issue was mostly around JMX port 7199. There was another rollling restart done on the 26 nodes to remediate the JMX ports issue in that restart one node had the issue where port 9042 was considered used after the restart and the process died after a bit of time. > What needs to be done for port the native port 9042 and JMX port 7199 is to create the underlying TCP socket with SO_REUSEADDR. This eases the restriction and allows the port to be bound by process even if there are sockets open to that port in the TCP FSM, as long as there is no other process listening on that port. There is a Java method available to set this option in java.net.Socket https://docs.oracle.com/javase/7/docs/api/java/net/Socket.html#setReuseAddress%28boolean%29. > native port 9042: https://github.com/apache/cassandra/blob/4a0d1caa262af3b6f2b6d329e45766b4df845a88/tools/stress/src/org/apache/cassandra/stress/settings/SettingsPort.java#L38 > JMX port 7199: https://github.com/apache/cassandra/blob/4a0d1caa262af3b6f2b6d329e45766b4df845a88/tools/stress/src/org/apache/cassandra/stress/settings/SettingsPort.java#L40 > Looking in the code itself this option is being set on thrift (9160 (default)) and internode communication ports, uncrypted (7000 (default)) and SSL encrypted (7001 (default)) . > https://github.com/apache/cassandra/search?utf8=%E2%9C%93&q=setReuseAddress > This needs to be set to native and jmx ports as well. > References: > https://unix.stackexchange.com/questions/258379/when-is-a-port-considered-being-used/258380?noredirect=1 > https://stackoverflow.com/questions/23531558/allow-restarting-java-application-with-jmx-monitoring-enabled-immediately > https://docs.oracle.com/javase/8/docs/technotes/guides/rmi/socketfactory/ > https://github.com/apache/cassandra/search?utf8=%E2%9C%93&q=setReuseAddress > https://docs.oracle.com/javase/7/docs/api/java/net/Socket.html#setReuseAddress%28boolean%293 > https://github.com/apache/cassandra/blob/4a0d1caa262af3b6f2b6d329e45766b4df845a88/tools/stress/src/org/apache/cassandra/stress/settings/SettingsPort.java#L38 > https://github.com/apache/cassandra/blob/4a0d1caa262af3b6f2b6d329e45766b4df845a88/tools/stress/src/org/apache/cassandra/stress/settings/SettingsPort.java#L40 -- This message was sent by Atlassian JIRA (v6.3.4#6332)