Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 8437 invoked from network); 30 Mar 2010 00:42:41 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 30 Mar 2010 00:42:41 -0000 Received: (qmail 23694 invoked by uid 500); 30 Mar 2010 00:42:40 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 23676 invoked by uid 500); 30 Mar 2010 00:42:40 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 23668 invoked by uid 99); 30 Mar 2010 00:42:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Mar 2010 00:42:40 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of tsaloranta@gmail.com designates 209.85.160.44 as permitted sender) Received: from [209.85.160.44] (HELO mail-pw0-f44.google.com) (209.85.160.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Mar 2010 00:42:34 +0000 Received: by pwi10 with SMTP id 10so7445937pwi.31 for ; Mon, 29 Mar 2010 17:42:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:received:message-id:subject:from:to:content-type :content-transfer-encoding; bh=GG9G7gOQQ6Mb+rmq61ThBxajb7wDVQnb8NL/H6V1NYA=; b=GrruyYJvkz2O0Nl85pTWS26CgIzFkxpTh0Y6VQqfCyQn0LHCzZz22QQSkhzad8DmlO S8UXt+tenejg44kp05/EhNYDEUPBYwla1q9mEZqPQatGAjB6MiU/qxJ99BaeP1cM7Rqd IwsHeBZoJ5T+d5pyC+GAawHBgAqO2eJ4RyufY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=gfIJPwofroNOmkGq3YGwzSX6QwwVB60gGXQ1ay2ecYwAHEJN+wyTQ8qes3znR5zcLM 9fbE9r+AV3irdcL6C1HufJxKOK3rcSXRRbjnVNhX8t/9pTcmO7kdQ0QY2DhYVm15qkQb wCKKKejRz37FI10rfvgr5aC2oUbaUdIZDCqiM= MIME-Version: 1.0 Received: by 10.140.226.1 with HTTP; Mon, 29 Mar 2010 17:42:12 -0700 (PDT) In-Reply-To: References: Date: Mon, 29 Mar 2010 17:42:12 -0700 Received: by 10.141.125.19 with SMTP id c19mr792888rvn.55.1269909732972; Mon, 29 Mar 2010 17:42:12 -0700 (PDT) Message-ID: <5f7770581003291742s21abcf52y7ae1f4f9a55e33df@mail.gmail.com> Subject: Re: Question about node failure... From: Tatu Saloranta To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Mon, Mar 29, 2010 at 10:40 AM, Ned Wolpert w= rote: > So,=A0 what does "anti-entropy repair" do then? Fix discrepancies between live nodes? (caused by transient failures presuma= bly) > Sounds like you have to 'decommission' the dead node, then I thought run > 'nodeprobe repair' to get the data adjusted back to a replication factor = of > 3, right? > > Also, what is the method to decommission a dead node? pass in the IP addr= ess > of the dead node to nodeprobe on a member of the cluster? I've only used > 'decommission' to remove the node I ran it on from the cluster... not a > different node. > > It seems like if you decommission a node it should fix the replication > factor for data that was on that node in this case... Perhaps it would be good to have convenience workflow for replacing broken host ("squashing lemons")? I would assume that most common use case is to effectively replace host that can't be repaired (or perhaps it might sometimes be best way to do it anyway), by combination of removing failed host, bringing in new one. Handling this is as high-level logical operation could be more efficient than doing it step by step. -+ Tatu +-