Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2AD87FE18 for ; Sat, 6 Apr 2013 20:41:16 +0000 (UTC) Received: (qmail 11178 invoked by uid 500); 6 Apr 2013 20:41:16 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 11153 invoked by uid 500); 6 Apr 2013 20:41:16 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 11119 invoked by uid 99); 6 Apr 2013 20:41:15 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 06 Apr 2013 20:41:15 +0000 Date: Sat, 6 Apr 2013 20:41:15 +0000 (UTC) From: "Vijay (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CASSANDRA-3533) TimeoutException when there is a firewall issue. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay updated CASSANDRA-3533: ----------------------------- Attachment: 0001-3533-v2.patch V2 fixes the issue. Thanks! > TimeoutException when there is a firewall issue. > ------------------------------------------------ > > Key: CASSANDRA-3533 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3533 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Vijay > Assignee: Vijay > Priority: Minor > Fix For: 2.0 > > Attachments: 0001-3533-v2.patch, 0001-CASSANDRA-3533.patch, 3533.txt > > > When one node in the cluster is not able to talk to the other DC/RAC due to firewall or network related issue (StorageProxy calls fail), and the nodes are NOT marked down because at least one node in the cluster can talk to the other DC/RAC, we get timeoutException instead of throwing a unavailableException. > The problem with this: > 1) It is hard to monitor/identify these errors. > 2) It is hard to diffrentiate from the client if the node being bad vs a bad query. > 3) when this issue happens we have to wait for at-least the RPC timeout time to know that the query wont succeed. > Possible Solution: when marking a node down we might want to check if the node is actually alive by trying to communicate to it? So we can be sure that the node is actually alive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira