Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 216CA10583 for ; Tue, 30 Jul 2013 07:11:00 +0000 (UTC) Received: (qmail 4980 invoked by uid 500); 30 Jul 2013 07:10:53 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 4849 invoked by uid 500); 30 Jul 2013 07:10:52 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 4833 invoked by uid 99); 30 Jul 2013 07:10:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Jul 2013 07:10:51 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dechouxb@gmail.com designates 209.85.215.41 as permitted sender) Received: from [209.85.215.41] (HELO mail-la0-f41.google.com) (209.85.215.41) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Jul 2013 07:10:31 +0000 Received: by mail-la0-f41.google.com with SMTP id ec20so1152287lab.0 for ; Tue, 30 Jul 2013 00:10:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=3370uNzDa5krDiRgBQWzLCrgKkGK6WNCdd7VtOQuw/w=; b=pW/kN6U/+kcKTYeiN+JLqnv8QZcp2UkX0Uvcuf3eHHLd2pMGB+W3JT4VKAIHzm9ckj JuI0gxYRTRkjgjoLnQPZ62zlWCguMgTz4qTmZCYPGBKX1+lAlrx8ClZM06lnJYBLRKeS +RFbBgjuYfau90hGyw2eECDb4ZG2q0jiuYQEzh+iSZ3NDfrtc5O9QNd/rsAWX8V75REY Sr/8D0uiBeMyv3OK6OxCtTWJBkHuarrPMz+3PuynheB8Ob4ILtw9f4UMXpsjtm0IQOfl JycUWsfgGtoy9SgIdnTasucM4ROD5BXhNUjocQ/mONhSryMNEx2/5dlQU3rTxj/6NLxU RSzQ== MIME-Version: 1.0 X-Received: by 10.112.61.165 with SMTP id q5mr10892282lbr.31.1375168206205; Tue, 30 Jul 2013 00:10:06 -0700 (PDT) Received: by 10.112.198.42 with HTTP; Tue, 30 Jul 2013 00:10:06 -0700 (PDT) In-Reply-To: References: Date: Tue, 30 Jul 2013 09:10:06 +0200 Message-ID: Subject: Re: hadoop missing file? From: Bertrand Dechoux To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=001a11c3ed306ffbf704e2b54cc2 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c3ed306ffbf704e2b54cc2 Content-Type: text/plain; charset=ISO-8859-1 (10-3) * 129 = 903 But long answer 1) which missing file? 2) how do you know it is missing? You have a cluster with 3 datanodes, the default replication factor is 3 but not for the job jar which is 10 (mapred.submit.replication). Let's say you ran 129 jobs who failed in a weird way (like at submission), you would have 129 under-replicated blocks (one block per jar because your jar is small) and 903 missing replicas because with 3 datanodes you can't have more than 3 replicas anyway. So back to the first question : which missing file? It might only be that the file hasn't be uploaded in the first place. It happens. For all your blocks, you do have at least one replica : Minimally replicated blocks: 5651 (100.0 %) Bertrand On Tue, Jul 30, 2013 at 8:54 AM, ch huang wrote: > one of my workmate told me some of his file missing ,i use fs check find > following info , how can i prevent them from missing? anyone can help me? > > Status: HEALTHY > Total size: 272020850157 B (Total open files size: 652056 B) > Total dirs: 1143 > Total files: 1886 (Files currently being written: 2) > Total blocks (validated): 5651 (avg. block size 48136763 B) (Total > open file blocks (not validated): 1) > Minimally replicated blocks: 5651 (100.0 %) > Over-replicated blocks: 0 (0.0 %) > Under-replicated blocks: 129 (2.2827818 %) > Mis-replicated blocks: 0 (0.0 %) > Default replication factor: 3 > Average block replication: 3.0 > Corrupt blocks: 0 > Missing replicas: 903 (5.0571237 %) > Number of data-nodes: 3 > Number of racks: 1 > FSCK ended at Tue Jul 30 14:38:01 CST 2013 in 462 milliseconds > -- Bertrand Dechoux --001a11c3ed306ffbf704e2b54cc2 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
(10-3) * 129 =3D 9= 03

But long answer
1) which missing file?
2)= how do you know it is missing?

You have a cluster with 3 data= nodes, the default replication factor is 3 but not for the job jar which is= 10 (mapred.submit.replication).
Let's say you ran 129 jobs who failed in a weird way (like at sub= mission), you would have 129 under-replicated blocks (one block per jar bec= ause your jar is small) and 903 missing replicas because with 3 datanodes y= ou can't have more than 3 replicas anyway.

So back to the first question : which missing file?
It m= ight only be that the file hasn't be uploaded in the first place. It ha= ppens.

For all your blocks, you do have at least one repl= ica : Minimally replicated blocks:=A0=A0 5651 (100.0 %)

Bertrand



On Tue, Jul 30, 2013 at 8:54 AM, ch huang <justlooks@g= mail.com> wrote:
one of my workmate told me some of his = file missing ,i use fs check find following info , how can i prevent=A0 the= m from missing? anyone can help me?
=A0
Status: HEALTHY
=A0Total size:=A0=A0=A0 272020850157 B (Total open = files size: 652056 B)
=A0Total dirs:=A0=A0=A0 1143
=A0Total files:=A0= =A0 1886 (Files currently being written: 2)
=A0Total blocks (validated):= =A0=A0=A0=A0=A0 5651 (avg. block size 48136763 B) (Total open file blocks (= not validated): 1)
=A0Minimally replicated blocks:=A0=A0 5651 (100.0 %)
=A0Over-replicated = blocks:=A0=A0=A0=A0=A0=A0=A0 0 (0.0 %)
=A0Under-replicated blocks:=A0=A0= =A0=A0=A0=A0 129 (2.2827818 %)
=A0Mis-replicated blocks:=A0=A0=A0=A0=A0= =A0=A0=A0 0 (0.0 %)
=A0Default replication factor:=A0=A0=A0 3
=A0Average block replication:=A0=A0=A0=A0 3.0
=A0Corrupt blocks:=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 0
=A0Missing replicas:=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 903 (5.0571237 %)
=A0Number of data-nodes= :=A0=A0=A0=A0=A0=A0=A0=A0=A0 3
=A0Number of racks:=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0 1
FSCK ended at Tue Jul 30 14:38:01 CST 2013 in 46= 2 milliseconds



--
Bertrand Dechoux
--001a11c3ed306ffbf704e2b54cc2--