Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DF83E109AE for ; Tue, 10 Dec 2013 20:40:24 +0000 (UTC) Received: (qmail 87787 invoked by uid 500); 10 Dec 2013 20:40:19 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 87677 invoked by uid 500); 10 Dec 2013 20:40:19 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 87670 invoked by uid 99); 10 Dec 2013 20:40:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Dec 2013 20:40:19 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of silvianhadoop@gmail.com designates 209.85.212.45 as permitted sender) Received: from [209.85.212.45] (HELO mail-vb0-f45.google.com) (209.85.212.45) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Dec 2013 20:40:14 +0000 Received: by mail-vb0-f45.google.com with SMTP id i12so956422vbh.4 for ; Tue, 10 Dec 2013 12:39:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=wtva2dXIUo84bISAZ7jo968QH5oe83vgInuuCcznvjo=; b=KB+/PSqxeBvMvdPbiWD+VkrlZyVuQBsPWFhMPqyamniXbtBYO/rG/+p2ER83di6fYP NrDRj0VGk1WgqES6Acgb/WH3oV/ifWqjSxo62Q3szALXoyCW1JelCe+NRQmU78yhIVko dJX+RAaISV1rrikAFkVvJ36SKp6vamWit2fXFA81WmoO0mW4XXnAp+YdZjmRlUGOPTj2 i+6581FwsAPrb1Cxtzlc83XeRcFk9gXwxXtFa51wn5VrEWqoDhCkJJDutH9zDqY7O99k QuZhCZnEYmHNWKsCa/Q0farzmZOhtPDmWzJDjnvF4GBsAr2GgdD9aKZ4y6DqFjQ/XjsK 1hsA== MIME-Version: 1.0 X-Received: by 10.58.75.34 with SMTP id z2mr654834vev.57.1386707993536; Tue, 10 Dec 2013 12:39:53 -0800 (PST) Received: by 10.58.76.197 with HTTP; Tue, 10 Dec 2013 12:39:53 -0800 (PST) In-Reply-To: References: Date: Tue, 10 Dec 2013 12:39:53 -0800 Message-ID: Subject: Re: how to handle the corrupt block in HDFS? From: Patai Sangbutsarakum To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e013a0a185cb01404ed341dea X-Virus-Checked: Checked by ClamAV on apache.org --089e013a0a185cb01404ed341dea Content-Type: text/plain; charset=ISO-8859-1 10 copies for those job.jar and split are controlled by mapred.submit.replication property at job init level. On Mon, Dec 9, 2013 at 5:20 PM, ch huang wrote: > more strange , in my HDFS cluster ,every block has three replicas,but i > find some one has ten replicas ,why? > > # sudo -u hdfs hadoop fs -ls > /data/hisstage/helen/.staging/job_1385542328307_0915 > Found 5 items > -rw-r--r-- 3 helen hadoop 7 2013-11-29 14:01 > /data/hisstage/helen/.staging/job_1385542328307_0915/appTokens > -rw-r--r-- 10 helen hadoop 2977839 2013-11-29 14:01 > /data/hisstage/helen/.staging/job_1385542328307_0915/job.jar > -rw-r--r-- 10 helen hadoop 3696 2013-11-29 14:01 > /data/hisstage/helen/.staging/job_1385542328307_0915/job.split > > > On Tue, Dec 10, 2013 at 9:15 AM, ch huang wrote: > >> the strange thing is when i use the following command i find 1 corrupt >> block >> >> # curl -s http://ch11:50070/jmx |grep orrupt >> "CorruptBlocks" : 1, >> but when i run hdfs fsck / , i get none ,everything seems fine >> >> # sudo -u hdfs hdfs fsck / >> ........ >> >> ....................................Status: HEALTHY >> Total size: 1479728140875 B (Total open files size: 1677721600 B) >> Total dirs: 21298 >> Total files: 100636 (Files currently being written: 25) >> Total blocks (validated): 119788 (avg. block size 12352891 B) >> (Total open file blocks (not validated): 37) >> Minimally replicated blocks: 119788 (100.0 %) >> Over-replicated blocks: 0 (0.0 %) >> Under-replicated blocks: 166 (0.13857816 %) >> Mis-replicated blocks: 0 (0.0 %) >> Default replication factor: 3 >> Average block replication: 3.0027633 >> Corrupt blocks: 0 >> Missing replicas: 831 (0.23049656 %) >> Number of data-nodes: 5 >> Number of racks: 1 >> FSCK ended at Tue Dec 10 09:14:48 CST 2013 in 3276 milliseconds >> >> The filesystem under path '/' is HEALTHY >> >> >> On Tue, Dec 10, 2013 at 8:32 AM, ch huang wrote: >> >>> hi,maillist: >>> my nagios alert me that there is a corrupt block in HDFS all >>> day,but i do not know how to remove it,and if the HDFS will handle this >>> automaticlly? and if remove the corrupt block will cause any data >>> lost?thanks >>> >> >> > --089e013a0a185cb01404ed341dea Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
10 copies for those job.jar and split are controlled by=A0= =A0mapred.submit.replication=A0p= roperty at job init level.



On Mon, Dec 9, 2013 at 5:20 PM, ch huang <justlooks@gmail.com> wrote:
more strange , in my HDFS cluster ,every block has three replicas,but = i find some one has ten replicas ,why?
=A0
# sudo -u hdfs hadoop fs -ls /data/hisstage/helen/.staging/job_1385542= 328307_0915
Found 5 items
-rw-r--r--=A0=A0 3 helen hadoop=A0=A0=A0=A0= =A0=A0=A0=A0=A0 7 2013-11-29 14:01 /data/hisstage/helen/.staging/job_138554= 2328307_0915/appTokens
-rw-r--r--=A0 10 helen hadoop=A0=A0=A0 2977839 2013-11-29 14:01 /data/hisst= age/helen/.staging/job_1385542328307_0915/job.jar
-rw-r--r--=A0 10 helen= hadoop=A0=A0=A0=A0=A0=A0 3696 2013-11-29 14:01 /data/hisstage/helen/.stagi= ng/job_1385542328307_0915/job.split


On Tue, Dec 10, 2013 at 9:15 AM, ch huang <ju= stlooks@gmail.com> wrote:
the strange thing is when i use the following command i find 1 corrupt= block
=A0
#=A0 curl -s http:= //ch11:50070/jmx |grep orrupt
=A0=A0=A0 "CorruptBlocks" : = 1,
but when i run hdfs fsck / , i get none ,everything seems fine
=A0
# sudo -u hdfs hdfs fsck /
........
=A0
....................................Status: HEALTHY
=A0Total size:= =A0=A0=A0 1479728140875 B (Total open files size: 1677721600 B)
=A0Total= dirs:=A0=A0=A0 21298
=A0Total files:=A0=A0 100636 (Files currently bein= g written: 25)
=A0Total blocks (validated):=A0=A0=A0=A0=A0 119788 (avg. block size 1235289= 1 B) (Total open file blocks (not validated): 37)
=A0Minimally replicate= d blocks:=A0=A0 119788 (100.0 %)
=A0Over-replicated blocks:=A0=A0=A0=A0= =A0=A0=A0 0 (0.0 %)
=A0Under-replicated blocks:=A0=A0=A0=A0=A0=A0 166 (0= .13857816 %)
=A0Mis-replicated blocks:=A0=A0=A0=A0=A0=A0=A0=A0 0 (0.0 %)
=A0Default r= eplication factor:=A0=A0=A0 3
=A0Average block replication:=A0=A0=A0=A0 = 3.0027633
=A0Corrupt blocks:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0 0
=A0Missing replicas:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 831 (0= .23049656 %)
=A0Number of data-nodes:=A0=A0=A0=A0=A0=A0=A0=A0=A0 5
=A0Number of racks:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 1
FSCK end= ed at Tue Dec 10 09:14:48 CST 2013 in 3276 milliseconds

The filesystem under path '/' is HEALTHY


On Tue, Dec 10, 2013 at 8:32 AM, ch huang <ju= stlooks@gmail.com> wrote:
hi,maillist:
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 my nagios alert me that there is a c= orrupt block in HDFS all day,but i do not know how to remove it,and if the = HDFS will handle this automaticlly? and if remove the corrupt block will ca= use any data lost?thanks



--089e013a0a185cb01404ed341dea--