Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 4394 invoked from network); 20 Jun 2010 01:23:05 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 20 Jun 2010 01:23:05 -0000 Received: (qmail 34189 invoked by uid 500); 20 Jun 2010 01:23:04 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 34143 invoked by uid 500); 20 Jun 2010 01:23:03 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 34134 invoked by uid 99); 20 Jun 2010 01:23:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 20 Jun 2010 01:23:03 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of aj@birthdayalarm.com designates 209.85.214.172 as permitted sender) Received: from [209.85.214.172] (HELO mail-iw0-f172.google.com) (209.85.214.172) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 20 Jun 2010 01:22:57 +0000 Received: by iwn2 with SMTP id 2so2339858iwn.31 for ; Sat, 19 Jun 2010 18:22:36 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.168.129 with SMTP id u1mr3585631iby.49.1276996956187; Sat, 19 Jun 2010 18:22:36 -0700 (PDT) Sender: aj@birthdayalarm.com Received: by 10.231.148.15 with HTTP; Sat, 19 Jun 2010 18:22:36 -0700 (PDT) In-Reply-To: References: Date: Sat, 19 Jun 2010 18:22:36 -0700 X-Google-Sender-Auth: r2yoih-zp3D7JFAcKulgc6KSi7M Message-ID: Subject: Re: Occasional 10s Timeouts on Read From: AJ Slater To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org tcpdump shows bidirectional communication with ACKs during a known problem period. I did not have TRACE logging going during the period I have tcpdump logs, but I assume that an 'INFO error connecting to' is probably caused by ConnectExceptions For instance... lpc03:~$ telnet fs02 7000 ...connects during the problem period. I wish the ConnectException contained port information to be very sure of what it was trying to attempt. But my setup uses default Gossip ports. The only interface thing that's non-standard is that the JMX ports are set to 8081 on all hosts. Hopefully I'll be able to do another experiment in an hour or so, but then going camping for a couple days. AJ On Sat, Jun 19, 2010 at 5:05 PM, Peter Schuller wrote: >> TRACE 14:42:06,248 unable to connect to /10.33.3.20 >> java.net.ConnectException: Connection refused >> =A0 =A0 =A0 =A0at java.net.PlainSocketImpl.socketConnect(Native Method) > > So that's interesting since it is a clear failure that comes from the > operating system and indicates something which can be observed outside > of cassandra using system tools. Presumably either cassandra is > somehow connecting to the wrong port, or this is a > firewalling/os/network issue, or the 'other' cassandra is not > listening on the port. Using tcpdump/netstat -nlp should narrow that > down. > > Is it possible connections only succeed in one direction for example? > > -- > / Peter Schuller >