Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 813B410E80 for ; Sun, 27 Oct 2013 18:43:01 +0000 (UTC) Received: (qmail 77139 invoked by uid 500); 27 Oct 2013 18:42:48 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 76826 invoked by uid 500); 27 Oct 2013 18:42:42 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 76819 invoked by uid 99); 27 Oct 2013 18:42:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 27 Oct 2013 18:42:40 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [209.85.212.170] (HELO mail-wi0-f170.google.com) (209.85.212.170) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 27 Oct 2013 18:42:33 +0000 Received: by mail-wi0-f170.google.com with SMTP id ex4so575985wid.5 for ; Sun, 27 Oct 2013 11:42:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=T+TeGtcUlVQLkfJ18agMfLVlHMVV6YrBNLa7u63otSk=; b=WAeIHEcWMCJ5UphIME7QckvSbXTfmT/qP/3rQdo0MbEdPGXXKLWiHWSEVWwbJfSXgb TWsizKIapjDxTIsrUxEOGobOxi20xPoYHGIoPw2A0kb5OIsfT59NmL33U8sKs2I7o649 fD4kVLh7P2bKK1Ma7vgiQbjj4IundsWX56pMZj9aFDZPID+hljtrDpP8JKh6dPxwnuux GIrVkY3nLpj9vjnaA+4a9s++bDPJPq4pQElnud2d0MAG4rb0dTtzrIqurD9DJgbbBjYJ 64D63se7QPubgI30czkPb/ZnN95iSXANLBxVuairpUsaCHd/DAylCWsbgJzUad1qP+IT bugw== X-Gm-Message-State: ALoCoQn6blTwvEwFReVHtTKi/1uAUA+TZXUQr0h3rfOl29P+7QYcGtSVLcCmS7dzm9drC2/xpqKB MIME-Version: 1.0 X-Received: by 10.180.83.228 with SMTP id t4mr6078962wiy.12.1382899332868; Sun, 27 Oct 2013 11:42:12 -0700 (PDT) Received: by 10.216.175.138 with HTTP; Sun, 27 Oct 2013 11:42:12 -0700 (PDT) X-Originating-IP: [209.150.41.132] Date: Sun, 27 Oct 2013 14:42:12 -0400 Message-ID: Subject: question about hdfs data loss risk From: Koert Kuipers To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=f46d04428ba07f078904e9bd5770 X-Virus-Checked: Checked by ClamAV on apache.org --f46d04428ba07f078904e9bd5770 Content-Type: text/plain; charset=ISO-8859-1 i have a cluster with replication factor 2. wit the following events in this order, do i have data loss? 1) shut down a datanode for maintenance unrelated to hdfs. so now some blocks only have replication factor 1 2) a disk dies in another datanode. let's assume some blocks now have replication factor 0 since they were on this disk that died and on the datanode that is shut down for maintenance. 3) bring back up the datanode that was down for maintenance. what i am worried about is that the namenode gives up on a block with replication factor 0 after steps 1) and 2) and considers it lost, and by the time the replica will come back on again in step 3) the namenode no longer considers the block to be existent. thanks! koert --f46d04428ba07f078904e9bd5770 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
i have a cluster with replication factor 2. wit the followin= g events in this order, do i have data loss?

1) shut down a datanode for maintenance unrelated to hdfs. so now some = blocks only have replication factor 1

2) a disk dies in another = datanode. let's assume some blocks now have replication factor 0 since = they were on this disk that died and on the datanode that is shut down for = maintenance.

3) bring back up the datanode that was down for maintenance.
= =A0
what i am worried about is that the namenode gives up on a block= with replication factor 0 after steps 1) and 2) and considers it lost, and= by the time the replica will come back on again in step 3) the namenode no= longer considers the block to be existent.

thanks! koert

--f46d04428ba07f078904e9bd5770--