Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 85313 invoked from network); 12 Apr 2011 16:31:25 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 12 Apr 2011 16:31:25 -0000 Received: (qmail 62920 invoked by uid 500); 12 Apr 2011 16:31:23 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 62897 invoked by uid 500); 12 Apr 2011 16:31:23 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 62889 invoked by uid 99); 12 Apr 2011 16:31:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Apr 2011 16:31:23 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL,X_IP X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of alienth@gmail.com designates 209.85.160.188 as permitted sender) Received: from [209.85.160.188] (HELO mail-gy0-f188.google.com) (209.85.160.188) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Apr 2011 16:31:17 +0000 Received: by gyg4 with SMTP id 4so12862428gyg.25 for ; Tue, 12 Apr 2011 09:30:56 -0700 (PDT) MIME-Version: 1.0 Received: by 10.91.69.7 with SMTP id w7mr1006815agk.18.1302625856893; Tue, 12 Apr 2011 09:30:56 -0700 (PDT) Received: by m13g2000yqb.googlegroups.com with HTTP; Tue, 12 Apr 2011 09:30:56 -0700 (PDT) Date: Tue, 12 Apr 2011 09:30:56 -0700 (PDT) X-IP: 99.191.194.126 User-Agent: G2/1.0 X-HTTP-UserAgent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.15) Gecko/20110303 Ubuntu/10.10 (maverick) Firefox/3.6.15,gzip(gfe) Message-ID: <9fad8439-d8fb-4390-be45-33875fd97476@m13g2000yqb.googlegroups.com> Subject: pycassa timeouts resolved by killing a random node in the ring From: Jason Harvey To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Interesting issue this morning. My apps started throwing a bunch of pycassa timeouts all of a sudden. The ring looked perfect. No load issues anywhere, and no errors in the logs. The site was basically down, so I got desperate and whacked a random node in the ring. As soon as gossip saw it go down, the timeouts went away. Thinking that was kinda crazy, I started the node back up. As soon as it rejoined the ring, pycassa started timing out again. I then killed another random node, far away from the first node I killed, and the timeouts stopped again. Started it back up, and the timeouts started again when it rejoined the ring. Repeated this process once more just to make sure I wasn't insane, and the same result happened. Killing any single node, anywhere in the ring, fixes my timeouts. Actively able to repro this. I am having to just keep one node down right now so the site doesn't break. Desperate for any suggestions or advice on this. Using pycassa 1.0.7. Timeout is set to 15 seconds, with 3 retries. Reads and writes are in quorum. 27 nodes in the ring, with an RF of 3. Thanks, Jason