Return-Path: Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: (qmail 91268 invoked from network); 11 Apr 2011 15:15:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 11 Apr 2011 15:15:07 -0000 Received: (qmail 19498 invoked by uid 500); 11 Apr 2011 15:15:07 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 19476 invoked by uid 500); 11 Apr 2011 15:15:07 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 19460 invoked by uid 500); 11 Apr 2011 15:15:07 -0000 Delivered-To: apmail-hadoop-zookeeper-user@hadoop.apache.org Received: (qmail 19435 invoked by uid 99); 11 Apr 2011 15:15:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Apr 2011 15:15:07 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ted.dunning@gmail.com designates 209.85.220.176 as permitted sender) Received: from [209.85.220.176] (HELO mail-vx0-f176.google.com) (209.85.220.176) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Apr 2011 15:14:59 +0000 Received: by vxa37 with SMTP id 37so6488334vxa.35 for ; Mon, 11 Apr 2011 08:14:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=w8+jV7z+h0u8F0e9WLQC3OaeDVl0HMh7BBQdYGoyLiA=; b=DUzmhNM+xulBVEs1MN+To4OqByzUt+ELASvzheypzgQQMmcwh3UnXPqGtsM7ilwr5l 2OBZCp8O0RhDIFA/7MmUXs4R3kHAilAOoSwIWhJp3p88PmElAZwvZMi3QUzETzbXp73z LS1b4uVNlUYJScK4mMb6dVZAwaVnmxMFIcP28= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; b=MH56UTd1tlYAH49oGKYR+fRvrq/sVWdz1NvQQXLQEjxyYAgvxN7E+05Zdh4oHqsesC cDI4igoOAZdNNZDcz7iB0JV/aLq36e1EvCuYZbfWiJpttydIzDHudauLgKgEnFhjzFjp CNuxsYaV16czEFf7uBCk2f7//8glKsceIV+To= Received: by 10.52.101.168 with SMTP id fh8mr1960765vdb.134.1302534877899; Mon, 11 Apr 2011 08:14:37 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.164.130 with HTTP; Mon, 11 Apr 2011 08:14:17 -0700 (PDT) In-Reply-To: References: From: Ted Dunning Date: Mon, 11 Apr 2011 08:14:17 -0700 Message-ID: Subject: Re: Repair cluster on EC2 To: user@zookeeper.apache.org Cc: Andrei Savu , zookeeper-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=bcaec548a887b46be304a0a6082f X-Virus-Checked: Checked by ClamAV on apache.org --bcaec548a887b46be304a0a6082f Content-Type: text/plain; charset=UTF-8 On Mon, Apr 11, 2011 at 4:43 AM, Andrei Savu wrote: > Is it possible to repair a ZooKeeper cluster on EC2 by using the > following algorithm with no downtime and data loss? > Yes. Been there. Done that. Works like a champ. 1. start a cluster with >3 nodes > 2. if one node fails start a new machine and record the new IP > 3. rebuild the configuration file by replacing the IP of the node that > failed with the IP attached to the new machine > 4. do a rolling restart and replace all configuration files > > Am I missing something? Could this process be executed by a script? > Sounds right and you should be able to do it with a script. Use caution, of course. > I'm also thinking about extending the client library in order to make > it EC2 aware (it should be able to automatically discover ZK nodes). > This way lies danger! The problem is that is cluster membership becomes very flexible then you run the risk of diluting the guarantees that ZK provides based on the quorum requirements. --bcaec548a887b46be304a0a6082f--