From user-return-21056-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Sun Sep 25 17:40:23 2011 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 09EA49970 for ; Sun, 25 Sep 2011 17:40:23 +0000 (UTC) Received: (qmail 64557 invoked by uid 500); 25 Sep 2011 17:40:21 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 64523 invoked by uid 500); 25 Sep 2011 17:40:21 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 64515 invoked by uid 99); 25 Sep 2011 17:40:21 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 25 Sep 2011 17:40:21 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of driftx@gmail.com designates 209.85.214.44 as permitted sender) Received: from [209.85.214.44] (HELO mail-bw0-f44.google.com) (209.85.214.44) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 25 Sep 2011 17:40:13 +0000 Received: by bkaq10 with SMTP id q10so5533182bka.31 for ; Sun, 25 Sep 2011 10:39:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; bh=HrBZ6e87Msd1mtHWO0hrFehmIiSgCh4GSBMqVtoOYus=; b=ROKZ1YG5tJaDE+TV8+YW+lN7UFn7eRTjO7DlSyJ/m4SMS0SyTMdUjS4ATXIBB9nJUq DuEGhs0rNtgNfMF0trilRLru9knxwh4rMQsK+qh5zXmYLQqv/BK31TI9lSPt3MpUxtgf r2oDezOU+9K9Jv9JVyWtdWtH94WM6RmM7mduQ= Received: by 10.204.132.203 with SMTP id c11mr1062622bkt.303.1316972393118; Sun, 25 Sep 2011 10:39:53 -0700 (PDT) MIME-Version: 1.0 Received: by 10.204.172.193 with HTTP; Sun, 25 Sep 2011 10:39:33 -0700 (PDT) In-Reply-To: References: From: Brandon Williams Date: Sun, 25 Sep 2011 12:39:33 -0500 Message-ID: Subject: Re: frequent node UP/Down? To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Sat, Sep 24, 2011 at 4:54 PM, Yang wrote: > I'm using 1.0.0 > > > there seems to be too many node Up/Dead events detected by the failure > detector. > I'm using =A0a 2 node cluster on EC2, in the same region, same security > group, so I assume the message drop > rate should be fairly low. > but in about every 5 minutes, I'm seeing some node detected as down, > and then Up again quickly This is fairly common on ec2 due to wild variance in the network. Increase your phi_convict_threshold to 10 or higher (but I wouldn't go over 12, this is roughly an exponential increase) -Brandon