hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kawa <kawa.a...@gmail.com>
Subject Re: how to handle the corrupt block in HDFS?
Date Wed, 11 Dec 2013 02:01:01 GMT
When you identify a file with corrupt block(s), then you can locate the
machines that stores its block by typing
$ sudo -u hdfs hdfs fsck <path-to-file> -files -blocks -locations


2013/12/11 Adam Kawa <kawa.adam@gmail.com>

> Maybe this can work for you
> $ sudo -u hdfs hdfs fsck / -list-corruptfileblocks
> ?
>
>
> 2013/12/11 ch huang <justlooks@gmail.com>
>
>> thanks for reply, what i do not know is how can i locate the block which
>> has the corrupt replica,(so i can observe how long the corrupt replica will
>> be removed and a new health replica replace it,because i get nagios alert
>> for three days,i do not sure if it is the same corrupt replica cause the
>> alert ,and i do not know the interval of hdfs check corrupt replica and
>> clean it)
>>
>>
>> On Tue, Dec 10, 2013 at 6:20 PM, Vinayakumar B <vinayakumar.b@huawei.com>wrote:
>>
>>>  Hi ch huang,
>>>
>>>
>>>
>>> It may seem strange, but the fact is,
>>>
>>> *CorruptBlocks* through JMX means *“Number of blocks with corrupt
>>> replicas”. May not be all replicas are corrupt.  *This you can check
>>> though jconsole for description.
>>>
>>>
>>>
>>> Where as *Corrupt blocks* through fsck means, *blocks with all replicas
>>> corrupt(non-recoverable)/ missing.*
>>>
>>>
>>>
>>> In your case, may be one of the replica is corrupt, not all replicas of
>>> same block. This corrupt replica will be deleted automatically if one more
>>> datanode available in your cluster and block replicated to that.
>>>
>>>
>>>
>>>
>>>
>>> Related to replication 10, As Peter Marron said, *some of the important
>>> files of the mapreduce job will set the replication of 10, to make it
>>> accessible faster and launch map tasks faster. *
>>>
>>> Anyway, if the job is success these files will be deleted auomatically.
>>> I think only in some cases if the jobs are killed in between these files
>>> will remain in hdfs showing underreplicated blocks.
>>>
>>>
>>>
>>> Thanks and Regards,
>>>
>>> Vinayakumar B
>>>
>>>
>>>
>>> *From:* Peter Marron [mailto:Peter.Marron@trilliumsoftware.com]
>>> *Sent:* 10 December 2013 14:19
>>> *To:* user@hadoop.apache.org
>>> *Subject:* RE: how to handle the corrupt block in HDFS?
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> I am sure that there are others who will answer this better, but anyway.
>>>
>>> The default replication level for files in HDFS is 3 and so most files
>>> that you
>>>
>>> see will have a replication level of 3. However when you run a Map/Reduce
>>>
>>> job the system knows in advance that every node will need a copy of
>>>
>>> certain files. Specifically the job.xml and the various jars containing
>>>
>>> classes that will be needed to run the mappers and reducers. So the
>>>
>>> system arranges that some of these files have a higher replication
>>> level. This increases
>>>
>>> the chances that a copy will be found locally.
>>>
>>> By default this higher replication level is 10.
>>>
>>>
>>>
>>> This can seem a little odd on a cluster where you only have, say, 3
>>> nodes.
>>>
>>> Because it means that you will almost always have some blocks that are
>>> marked
>>>
>>> under-replicated. I think that there was some discussion a while back to
>>> change
>>>
>>> this to make the replication level something like min(10, #number of
>>> nodes)
>>>
>>> However, as I recall, the general consensus was that this was extra
>>>
>>> complexity that wasn’t really worth it. If it ain’t broke…
>>>
>>>
>>>
>>> Hope that this helps.
>>>
>>>
>>>
>>> *Peter Marron*
>>>
>>> Senior Developer, Research & Development
>>>
>>>
>>>
>>> Office: +44 *(0) 118-940-7609*  peter.marron@trilliumsoftware.com
>>>
>>> Theale Court First Floor, 11-13 High Street, Theale, RG7 5AH, UK
>>>
>>>    <https://www.facebook.com/pages/Trillium-Software/109184815778307>
>>>
>>>  <https://twitter.com/TrilliumSW>
>>>
>>>  <http://www.linkedin.com/company/17710>
>>>
>>>
>>>
>>> *www.trilliumsoftware.com <http://www.trilliumsoftware.com/>*
>>>
>>> Be Certain About Your Data. Be Trillium Certain.
>>>
>>>
>>>
>>> *From:* ch huang [mailto:justlooks@gmail.com <justlooks@gmail.com>]
>>> *Sent:* 10 December 2013 01:21
>>> *To:* user@hadoop.apache.org
>>> *Subject:* Re: how to handle the corrupt block in HDFS?
>>>
>>>
>>>
>>> more strange , in my HDFS cluster ,every block has three replicas,but i
>>> find some one has ten replicas ,why?
>>>
>>>
>>>
>>> # sudo -u hdfs hadoop fs -ls
>>> /data/hisstage/helen/.staging/job_1385542328307_0915
>>> Found 5 items
>>> -rw-r--r--   3 helen hadoop          7 2013-11-29 14:01
>>> /data/hisstage/helen/.staging/job_1385542328307_0915/appTokens
>>> -rw-r--r--  10 helen hadoop    2977839 2013-11-29 14:01
>>> /data/hisstage/helen/.staging/job_1385542328307_0915/job.jar
>>> -rw-r--r--  10 helen hadoop       3696 2013-11-29 14:01
>>> /data/hisstage/helen/.staging/job_1385542328307_0915/job.split
>>>
>>> On Tue, Dec 10, 2013 at 9:15 AM, ch huang <justlooks@gmail.com> wrote:
>>>
>>> the strange thing is when i use the following command i find 1 corrupt
>>> block
>>>
>>>
>>>
>>> #  curl -s http://ch11:50070/jmx |grep orrupt
>>>     "CorruptBlocks" : 1,
>>>
>>> but when i run hdfs fsck / , i get none ,everything seems fine
>>>
>>>
>>>
>>> # sudo -u hdfs hdfs fsck /
>>>
>>> ........
>>>
>>>
>>>
>>> ....................................Status: HEALTHY
>>>  Total size:    1479728140875 B (Total open files size: 1677721600 B)
>>>  Total dirs:    21298
>>>  Total files:   100636 (Files currently being written: 25)
>>>  Total blocks (validated):      119788 (avg. block size 12352891 B)
>>> (Total open file blocks (not validated): 37)
>>>  Minimally replicated blocks:   119788 (100.0 %)
>>>  Over-replicated blocks:        0 (0.0 %)
>>>  Under-replicated blocks:       166 (0.13857816 %)
>>>  Mis-replicated blocks:         0 (0.0 %)
>>>  Default replication factor:    3
>>>  Average block replication:     3.0027633
>>>  Corrupt blocks:                0
>>>  Missing replicas:              831 (0.23049656 %)
>>>  Number of data-nodes:          5
>>>  Number of racks:               1
>>> FSCK ended at Tue Dec 10 09:14:48 CST 2013 in 3276 milliseconds
>>>
>>>
>>> The filesystem under path '/' is HEALTHY
>>>
>>> On Tue, Dec 10, 2013 at 8:32 AM, ch huang <justlooks@gmail.com> wrote:
>>>
>>> hi,maillist:
>>>
>>>             my nagios alert me that there is a corrupt block in HDFS all
>>> day,but i do not know how to remove it,and if the HDFS will handle this
>>> automaticlly? and if remove the corrupt block will cause any data
>>> lost?thanks
>>>
>>>
>>>
>>>
>>>
>>
>>
>

Mime
View raw message