Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of tsaloranta@gmail.com designates
 209.85.160.44 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type:content-transfer-encoding;
        b=gfIJPwofroNOmkGq3YGwzSX6QwwVB60gGXQ1ay2ecYwAHEJN+wyTQ8qes3znR5zcLM
         9fbE9r+AV3irdcL6C1HufJxKOK3rcSXRRbjnVNhX8t/9pTcmO7kdQ0QY2DhYVm15qkQb
         wCKKKejRz37FI10rfvgr5aC2oUbaUdIZDCqiM=
MIME-Version: 1.0
In-Reply-To: <b09da7c51003291040g4fde13a6l4e7d72f9618c323f@mail.gmail.com>
References: <b09da7c51003291027s5433f1a2g81049823ab9b6d14@mail.gmail.com>
	 <e06563881003291032m74877c78n85c9eff019adb7de@mail.gmail.com>
	 <b09da7c51003291040g4fde13a6l4e7d72f9618c323f@mail.gmail.com>
Date: Mon, 29 Mar 2010 17:42:12 -0700
Message-ID: <5f7770581003291742s21abcf52y7ae1f4f9a55e33df@mail.gmail.com>
Subject: Re: Question about node failure...
From: Tatu Saloranta <tsaloranta@gmail.com>
To: user@cassandra.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Mon, Mar 29, 2010 at 10:40 AM, Ned Wolpert <ned.wolpert@imemories.com> w=
rote:
> So,=A0 what does "anti-entropy repair" do then?

Fix discrepancies between live nodes? (caused by transient failures presuma=
bly)

> Sounds like you have to 'decommission' the dead node, then I thought run
> 'nodeprobe repair' to get the data adjusted back to a replication factor =
of
> 3, right?
>
> Also, what is the method to decommission a dead node? pass in the IP addr=
ess
> of the dead node to nodeprobe on a member of the cluster? I've only used
> 'decommission' to remove the node I ran it on from the cluster... not a
> different node.
>
> It seems like if you decommission a node it should fix the replication
> factor for data that was on that node in this case...

Perhaps it would be good to have convenience workflow for replacing
broken host ("squashing lemons")? I would assume that most common use
case is to effectively replace host that can't be repaired (or perhaps
it might sometimes be best way to do it anyway), by combination of
removing failed host, bringing in new one. Handling this is as
high-level logical operation could be more efficient than doing it
step by step.

-+ Tatu +-