Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 31861E4D6 for ; Thu, 31 Jan 2013 06:35:35 +0000 (UTC) Received: (qmail 65448 invoked by uid 500); 31 Jan 2013 06:35:34 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 64346 invoked by uid 500); 31 Jan 2013 06:35:27 -0000 Mailing-List: contact hdfs-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-user@hadoop.apache.org Delivered-To: mailing list hdfs-user@hadoop.apache.org Received: (qmail 64314 invoked by uid 99); 31 Jan 2013 06:35:26 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Jan 2013 06:35:26 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of saint.ack@gmail.com designates 209.85.214.49 as permitted sender) Received: from [209.85.214.49] (HELO mail-bk0-f49.google.com) (209.85.214.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Jan 2013 06:35:19 +0000 Received: by mail-bk0-f49.google.com with SMTP id w11so1187703bku.22 for ; Wed, 30 Jan 2013 22:34:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:sender:date:x-google-sender-auth:message-id :subject:from:to:content-type; bh=R/FFGWP+f17qt7NH22aIa6Ke5si0tU0+MQBr5AZMwbE=; b=f5y5hD7OHri9w5vduN41IqC/4bEI7D9yJaFihev610fPpxaNOmT6v8u1EcGwDdXw24 DcMyGCnO4yKPzSpr8f1X1cjQsvQy69H+2FAlQBubkb0l/f87n6C5Rxe19VU7ztaIYWrt wLJPXp0RUu5hYrGpftml3Vgs450trr48ibYReFPdQYLFo/V2Nhcam/D0FGisJShpZ9RS ox/31FmLRBdNB5eDWtsUJUhptrBj5OyQnW1p0SGpfq1GdPg9nW2WekNYc77KUyaytQv6 9mpjSOAOFxjfAApVdAQqnJdrXZ19NidanMnM0x13MLHgV3CEnC1fJbU8DZy7l88bVfN8 mkzA== MIME-Version: 1.0 X-Received: by 10.204.12.220 with SMTP id y28mr1999405bky.112.1359614098224; Wed, 30 Jan 2013 22:34:58 -0800 (PST) Sender: saint.ack@gmail.com Received: by 10.205.39.135 with HTTP; Wed, 30 Jan 2013 22:34:57 -0800 (PST) Date: Wed, 30 Jan 2013 22:34:57 -0800 X-Google-Sender-Auth: x8zgMbfbpUGQrMacH3xA-7xdJv8 Message-ID: Subject: How to remove three disks from three different nodes in a ten node cluster in less than an hour without losing replicas? From: Stack To: hdfs-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0015174c1d845b374204d48fd36f X-Virus-Checked: Checked by ClamAV on apache.org --0015174c1d845b374204d48fd36f Content-Type: text/plain; charset=UTF-8 Here is a little puzzle. An admin works for a cash-strapped, popular web shop. At the datacenter she has a ten node cluster that is heavily used. It runs hot all day long and decommissioning a node with its background replicating of 12 disks worth of data messes up the work load she has on top of it and makes her clients very unhappy. Replicating the data of one node takes at least an hour. This cluster has three bad disks in three different nodes (replication factor is 3). The admin lives an hour from the datacenter. She can't afford a cage monkey and so must replace the disks herself. If she left home at 2pm and had to be back by 6pm before the kids came home from school, how would she replace the three disks without for sure losing a replica? Is the only answer remove one, wait on clean fsck run, remove the next one? Thanks, St.Ack --0015174c1d845b374204d48fd36f Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Here is a little puzzle.

An admin works for a cash-strapped, popular web shop. =C2=A0At the data= center she has a ten node cluster that is heavily used. =C2=A0It runs hot a= ll day long and decommissioning a node with its background replicating of 1= 2 disks worth of data messes up the work load she has on top of it and make= s her clients very unhappy. =C2=A0Replicating the data of one node takes at= least an hour. =C2=A0This cluster has three bad disks in three different n= odes (replication factor is 3). =C2=A0The admin lives an hour from the data= center. =C2=A0She can't afford a cage monkey and so must replace the di= sks herself.

If she left home at 2pm and had to be back by 6pm before the= kids came home from school, how would she replace the three disks without = for sure losing a replica?

Is the only answe= r remove one, wait on clean fsck run, remove the next one?

Thanks,
St.Ack


--0015174c1d845b374204d48fd36f--