Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id ABDAF9884 for ; Fri, 21 Oct 2011 09:26:39 +0000 (UTC) Received: (qmail 24559 invoked by uid 500); 21 Oct 2011 09:26:35 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 24231 invoked by uid 500); 21 Oct 2011 09:26:32 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 24217 invoked by uid 99); 21 Oct 2011 09:26:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Oct 2011 09:26:31 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of lossil@gmail.com designates 209.85.220.176 as permitted sender) Received: from [209.85.220.176] (HELO mail-vx0-f176.google.com) (209.85.220.176) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Oct 2011 09:26:25 +0000 Received: by vcdn13 with SMTP id n13so4681408vcd.35 for ; Fri, 21 Oct 2011 02:26:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; bh=f23pkN49buJrC/ZRcIgm93CDZkctV0kngWASBlhwmqk=; b=D8QfWDuyQ4kLrGsxBvC4DyOLkrHJRJz2NizYcKCZfkWCNmQ7MtR9IEW7o7/+4HE9fJ dzPHf6VgMDjxBHipx7jMHL1JWhKFEQ4+mbrzoAfuwLn1HC5wJzsteZqxCyovwTt7NRPD JmWm0pYgya2Y8ksDx9Cb0tPmjp8H9FuZzH3Q0= MIME-Version: 1.0 Received: by 10.52.75.102 with SMTP id b6mr13743915vdw.90.1319189164501; Fri, 21 Oct 2011 02:26:04 -0700 (PDT) Received: by 10.52.182.167 with HTTP; Fri, 21 Oct 2011 02:26:04 -0700 (PDT) Date: Fri, 21 Oct 2011 12:26:04 +0300 Message-ID: Subject: lost data with 1 failed datanode and replication factor 3 in 6 node cluster From: Ossi To: common-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=bcaec5015d518aaa3104afcba939 --bcaec5015d518aaa3104afcba939 Content-Type: text/plain; charset=ISO-8859-1 hi, We managed to lost data when 1 datanode broke down in a cluster of 6 datanodes with replication factor 3. As far as I know, that shouldn't happen, since each blocks should have 1 copy in 3 different hosts. So, loosing even 2 nodes should be fine. Earlier we did some tests with replication factor 2, but reverted from that: 88 2011-10-12 06:46:49 hadoop dfs -setrep -w 2 -R / 148 2011-10-12 10:22:09 hadoop dfs -setrep -w 3 -R / The lost data was generated after replication factor was set back to 3. And even if replication factor would have been 2, data shouldn't have been lost, right? We wonder how that is possible and in what situations that could happen? br, Ossi --bcaec5015d518aaa3104afcba939--