Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8FD60DF37 for ; Mon, 29 Oct 2012 10:44:20 +0000 (UTC) Received: (qmail 69532 invoked by uid 500); 29 Oct 2012 10:44:15 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 69459 invoked by uid 500); 29 Oct 2012 10:44:15 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 69445 invoked by uid 99); 29 Oct 2012 10:44:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Oct 2012 10:44:15 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [203.91.198.79] (HELO wipro-blr-mx03.wipro.com) (203.91.198.79) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Oct 2012 10:44:09 +0000 X-AuditID: 0ac9321f-b7f8c6d000005a3a-5b-508e5de19f48 Received: from BLR-OUT-EDG02.wipro.com ( [203.91.193.32]) (using TLS with cipher AES128-SHA (128/128 bits)) (Client did not present a certificate) by wipro-blr-mx03.wipro.com (Symantec Messaging Gateway) with SMTP id 00.D1.23098.1ED5E805; Mon, 29 Oct 2012 16:13:46 +0530 (IST) Received: from BLR-EC-MBX1.wipro.com (10.208.51.111) by BLR-OUT-EDG02.wipro.com (203.91.193.32) with Microsoft SMTP Server (TLS) id 14.2.318.1; Mon, 29 Oct 2012 16:13:57 +0530 Received: from BLR-EC-MBX7.wipro.com ([169.254.7.138]) by BLR-EC-MBX1.wipro.com ([169.254.1.151]) with mapi id 14.02.0318.001; Mon, 29 Oct 2012 16:13:45 +0530 From: To: Subject: RE: How to do HADOOP RECOVERY ??? Thread-Topic: How to do HADOOP RECOVERY ??? Thread-Index: Ac21sCHn5nGTw0DRQEK4m7sPCjDgkAADld3DAAB/Kgs= Date: Mon, 29 Oct 2012 10:43:44 +0000 Message-ID: <2ADA1B0170E3434DA763D609DCD01EB54DF3D3D6@BLR-EC-MBX7.wipro.com> References: <2ADA1B0170E3434DA763D609DCD01EB54DF3D33A@BLR-EC-MBX7.wipro.com>,<1542FA4EE20C5048A5C2A3663BED2A6B30B09106@szxeml531-mbx.china.huawei.com> In-Reply-To: <1542FA4EE20C5048A5C2A3663BED2A6B30B09106@szxeml531-mbx.china.huawei.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.203.33.77] Content-Type: multipart/alternative; boundary="_000_2ADA1B0170E3434DA763D609DCD01EB54DF3D3D6BLRECMBX7wiproc_" MIME-Version: 1.0 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrJKsWRmVeSWpSXmKPExsVyOvqggu6j2L4Agy3vOS16pkxjcWD0mNC1 hTGAMaqB0SYxLy+/JLEkVSEltTjZVik8s6AoX8Elszg5JzEzN7VISSEzxVbJSEmhICcxOTU3 Na/EVimxoCA1L0XJjksBA9gAlWXmKaTmJeenZOal2yp5BvvrWliYWuoaKtmFZKQqZOal5Rfl JpZk5ucpJOfnlSRc4clYd6+PqeDIZMaKN/M2szYw3q/tYuTkkBAwkXjXu5IJwhaTuHBvPVsX IxeHkMAsJonXU/8zQTibGSV+/d/IAuEsYpT4+6yLGaSFTUBB4vLR+awgtoiAjMTlqadZQGxh AS2JKQd/sEHEtSVunrzHBGFbSRy7PB/MZhFQlZi1cDJYDa+Aj8Slm5/AZgoJTGOUuLJIGMTm FAiT2H1pIVicEei876fWgPUyC4hL3HoyH+psAYkle84zQ9iiEi8f/2OFsBUkTm6YxgJRny9x 68sbFohdghInZz5hgdilKtF65ijLBEaxWUjGzkLSMgtJC0RcT+LG1ClsELa2xLKFr5khbF2J Gf8OsSCLL2BkX8UoUQ6KbN2knCLd3AoDYz0wVy85P3cTIzDdcJ00kt/BeOiO8iHGIGCQTGSW 4gZFGDAFxBsbGBDJURLnZfIsDRASSAcmsezU1ILUovii0pzU4kOMTBycUg2Mge8LHqn4/fu3 45P99pxfXP18LGJpqcU6sgdD+f1T5uga3TRqvJ55oPKlzE6b6HCOkKNBOcK1d1Qa5N/e3ly7 PIfz/YU3MyULp06K+vJqXqtEGkNUQdLS2xuMv8fPf5L/v2L3a19jP5nJ/Jv/f300pWrn5J7c 9YklN4uuu71kk3wSyMpSI3FIiaU4I9FQi7moOBEAi2Oh8TwDAAA= X-Virus-Checked: Checked by ClamAV on apache.org --_000_2ADA1B0170E3434DA763D609DCD01EB54DF3D3D6BLRECMBX7wiproc_ Content-Type: text/plain; charset="iso-8859-1" content-transfer-encoding: quoted-printable Thanks Uma, I am using hadoop-0.20.2 version. UI shows. Cluster Summary 379 files and directories, 270 blocks =3D 649 total. Heap Size is 81.06 MB /= 991.69 MB (8%) WARNING : There are about 270 missing blocks. Please check the log or run fs= ck. Configured Capacity : 465.44 GB DFS Used : 20 KB Non DFS Used : 439.37 GB DFS Remaining : 26.07 GB DFS Used% : 0 % DFS Remaining% : 5.6 % Live Nodes := 1 Dead Nodes := 0 Firstly I have configured single node cluster and worked over it, after that= I have added another machine and made another one as a master + worker and= the fist machine as a worker only. I have saved the dfs.name.dir seprately, and started with fresh cluster... Now I have switched back to previous stage with single node with same old ma= chine having single node cluster. I have given the path for dfs.name.dir where I have kept that. Now I am running and getting this. I did -ls / operation and got this exception mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/ware= house/vw_cc/ Found 1 items -rw-r--r-- 1 mediaadmin supergroup 1774 2012-10-17 16:15 /user/hive/= warehouse/vw_cc/000000_0 mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -cat /user/hive/war= ehouse/vw_cc/000000_0 12/10/29 16:01:15 INFO hdfs.DFSClient: No node available for block: blk_-128= 0621588594166706_3595 file=3D/user/hive/warehouse/vw_cc/000000_0 12/10/29 16:01:15 INFO hdfs.DFSClient: Could not obtain block blk_-128062158= 8594166706_3595 from any node: java.io.IOException: No live nodes contain c= urrent block 12/10/29 16:01:18 INFO hdfs.DFSClient: No node available for block: blk_-128= 0621588594166706_3595 file=3D/user/hive/warehouse/vw_cc/000000_0 12/10/29 16:01:18 INFO hdfs.DFSClient: Could not obtain block blk_-128062158= 8594166706_3595 from any node: java.io.IOException: No live nodes contain c= urrent block 12/10/29 16:01:21 INFO hdfs.DFSClient: No node available for block: blk_-128= 0621588594166706_3595 file=3D/user/hive/warehouse/vw_cc/000000_0 12/10/29 16:01:21 INFO hdfs.DFSClient: Could not obtain block blk_-128062158= 8594166706_3595 from any node: java.io.IOException: No live nodes contain c= urrent block 12/10/29 16:01:24 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could= not obtain block: blk_-1280621588594166706_3595 file=3D/user/hive/warehouse= /vw_cc/000000_0 at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSCli= ent.java:1812) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient= .java:1638) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1= 767) at java.io.DataInputStream.read(DataInputStream.java:83) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85) at org.apache.hadoop.fs.FsShell.printToStdout(FsShell.java:114) at org.apache.hadoop.fs.FsShell.access$100(FsShell.java:49) at org.apache.hadoop.fs.FsShell$1.process(FsShell.java:352) at org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(= FsShell.java:1898) at org.apache.hadoop.fs.FsShell.cat(FsShell.java:346) I looked at NN Logs for one of the file.. it showing 2012-10-29 15:26:02,560 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesy= stem.audit: ugi=3Dnull ip=3Dnull cmd=3Dopen src=3D/user/hive/wareho= use/vw_cc/000000_0 dst=3Dnull perm=3Dnull . . . . Please suggest Regards Yogesh Kumar ________________________________ From: Uma Maheswara Rao G [maheswara@huawei.com] Sent: Monday, October 29, 2012 3:52 PM To: user@hadoop.apache.org Subject: RE: How to do HADOOP RECOVERY ??? Which version of Hadoop are you using? Do you have all DNs running? can you check UI report, wehther all DN are a l= ive? Can you check the DN disks are good or not? Can you grep the NN and DN logs with one of the corrupt blockID from below? Regards, Uma ________________________________ From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com] Sent: Monday, October 29, 2012 2:03 PM To: user@hadoop.apache.org Subject: How to do HADOOP RECOVERY ??? Hi All, I run this command hadoop fsck -Ddfs.http.address=3Dlocalhost:50070 / and found that some blocks are missing and corrupted results comes like.. /user/hive/warehouse/tt_report_htcount/000000_0: MISSING 2 blocks of total s= ize 71826120 B.. /user/hive/warehouse/tt_report_perhour_hit/000000_0: CORRUPT block blk_75438= 572351073797 /user/hive/warehouse/tt_report_perhour_hit/000000_0: MISSING 1 blocks of tot= al size 1531 B.. /user/hive/warehouse/vw_cc/000000_0: CORRUPT block blk_-1280621588594166706 /user/hive/warehouse/vw_cc/000000_0: MISSING 1 blocks of total size 1774 B.. /user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_8637186139854977= 656 /user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_4019541597438638= 886 /user/hive/warehouse/vw_report2/000000_0: MISSING 2 blocks of total size 718= 26120 B.. /user/zoo/foo.har/_index: CORRUPT block blk_3404803591387558276 . . . . . Total size: 7600625746 B Total dirs: 205 Total files: 173 Total blocks (validated): 270 (avg. block size 28150465 B) ******************************** CORRUPT FILES: 171 MISSING BLOCKS: 269 MISSING SIZE: 7600625742 B CORRUPT BLOCKS: 269 ******************************** Minimally replicated blocks: 1 (0.37037036 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 1 Average block replication: 0.0037037036 Corrupt blocks: 269 Missing replicas: 0 (0.0 %) Number of data-nodes: 1 Number of racks: 1 Is there any way to recover them ? Please help and suggest Thanks & Regards yogesh kumar The information contained in this electronic message and any attachments to= this message are intended for the exclusive use of the addressee(s) and may= contain proprietary, confidential or privileged information. If you are not= the intended recipient, you should not disseminate, distribute or copy this= e-mail. Please notify the sender immediately and destroy all copies of this= message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should= check this email and any attachments for the presence of viruses. The compa= ny accepts no liability for any damage caused by any virus transmitted by th= is email. www.wipro.com The information contained in this electronic message and any attachments to= this message are intended for the exclusive use of the addressee(s) and may= contain proprietary, confidential or privileged information. If you are not= the intended recipient, you should not disseminate, distribute or copy this= e-mail. Please notify the sender immediately and destroy all copies of this= message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should= check this email and any attachments for the presence of viruses. The compa= ny accepts no liability for any damage caused by any virus transmitted by th= is email. www.wipro.com --_000_2ADA1B0170E3434DA763D609DCD01EB54DF3D3D6BLRECMBX7wiproc_ Content-Type: text/html; charset="iso-8859-1" content-transfer-encoding: quoted-printable
Thanks Uma,

I am using hadoop-0.20.2 version.

UI shows.

Cluster Summary

379 files and directories, 270 blocks =3D 649 total. Heap Size is= 81.06 MB / 991.69 MB (8%)

WARNING : There are about 270 missing blocks. Please check the log or run fs= ck.

Configured Capacity : 465.44 GB
DFS Used : 20 KB
Non DFS Used : 439.37 GB
DFS Remaining : 26.07 GB
DFS Used% : 0 %
DFS Remaining% : 5.6 %
Live Nodes : 1
Dead Nodes : 0


Firstly I have configured single nod= e cluster and worked over it, after that I have added another machine and ma= de another one as a master + worker and the fist machine as a worker onl= y.

I have saved the dfs.name.dir seprat= ely, and started with fresh cluster...

Now I have switched back to previous= stage with single node with same old machine having single node cluster.
I have given the path for dfs.name.d= ir where I have kept that.

Now I am running and getting this.
I did -ls / operation and got this exception


mediaadmins-iMac-2:haadoop-0.20.2= mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
Found 1 items

-rw-r--r--   1 mediaadmin supergroup     =   1774 2012-10-17 16:15 /user/hive/warehouse/vw_cc/000000_0


mediaadmins-iMac-2:haadoop-0.20.2 mediaad= min$ HADOOP dfs -cat /user/hive/warehouse/vw_cc/000000_0


12/10/29 16:01:15 INFO hdfs.DFSClient: No node available for block: blk_-128= 0621588594166706_3595 file=3D/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:15 INFO hdfs.DFSClient: Could not obtain block blk_-128062158= 8594166706_3595 from any node:  java.io.IOException: No live nodes cont= ain current block
12/10/29 16:01:18 INFO hdfs.DFSClient: No node available for block: blk_-128= 0621588594166706_3595 file=3D/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:18 INFO hdfs.DFSClient: Could not obtain block blk_-128062158= 8594166706_3595 from any node:  java.io.IOException: No live nodes cont= ain current block
12/10/29 16:01:21 INFO hdfs.DFSClient: No node available for block: blk_-128= 0621588594166706_3595 file=3D/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:21 INFO hdfs.DFSClient: Could not obtain block blk_-128062158= 8594166706_3595 from any node:  java.io.IOException: No live nodes cont= ain current block
12/10/29 16:01:24 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could= not obtain block: blk_-1280621588594166706_3595 file=3D/user/hive/warehouse= /vw_cc/000000_0
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.choose= DataNode(DFSClient.java:1812)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockS= eekTo(DFSClient.java:1638)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(D= FSClient.java:1767)
    at java.io.DataInputStream.read(DataInputStream.java:83)<= br>     at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47= )
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85= )
    at org.apache.hadoop.fs.FsShell.printToStdout(FsShell.jav= a:114)
    at org.apache.hadoop.fs.FsShell.access$100(FsShell.java:4= 9)
    at org.apache.hadoop.fs.FsShell$1.process(FsShell.java:35= 2)
    at org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.= globAndProcess(FsShell.java:1898)
    at org.apache.hadoop.fs.FsShell.cat(FsShell.java:346)


I looked at NN Logs for one of the file..=

it showing

2012-10-29 15:26:02,560 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesy= stem.audit: ugi=3Dnull    ip=3Dnull    cmd=3Do= pen    src=3D/user/hive/warehouse/vw_cc/000000_0  &= nbsp; dst=3Dnull    perm=3Dnull
.
.
.
.

Please suggest

Regards
Yogesh Kumar




From: Uma Maheswara Rao G [maheswara@huawei.= com]
Sent: Monday, October 29, 2012 3:52 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???

Which version of Hadoop are you using?

 

Do you have all DNs running? can you check UI report, wehther all DN are= a live?

Can you check the DN disks are good or not?

Can you grep the NN and DN logs with one of the corrupt blockID from belo= w?

 

Regards,

Uma


From: yogesh.kumar13@wipro.com [yogesh.kuma= r13@wipro.com]
Sent: Monday, October 29, 2012 2:03 PM
To: user@hadoop.apache.org
Subject: How to do HADOOP RECOVERY ???

Hi All,

I run this command

hadoop fsck -Ddfs.http.address=3Dlocalhos= t:50070 /

and found that some blocks are missing and corrupted

results comes like..

/user/hive/warehouse/tt_report_htcount/000000_0: MISSING 2 blocks of total s= ize 71826120 B..
/user/hive/warehouse/tt_report_perhour_hit/000000_0: CORRUPT block blk_75438= 572351073797

/user/hive/warehouse/tt_report_perhour_hit/000000_0: MISSING 1 blocks of tot= al size 1531 B..
/user/hive/warehouse/vw_cc/000000_0: CORRUPT block blk_-1280621588594166706<= br>
/user/hive/warehouse/vw_cc/000000_0: MISSING 1 blocks of total size 1774 B..=
/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_8637186139854977= 656

/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_4019541597438638= 886

/user/hive/warehouse/vw_report2/000000_0: MISSING 2 blocks of total size 718= 26120 B..
/user/zoo/foo.har/_index: CORRUPT block blk_3404803591387558276
.
.
.
.
.

Total size:    7600625746 B
 Total dirs:    205
 Total files:    173
 Total blocks (validated):    270 (avg. block size 28150= 465 B)
  ********************************
  CORRUPT FILES:    171
  MISSING BLOCKS:    269
  MISSING SIZE:        7600625742 B
  CORRUPT BLOCKS:     269
  ********************************
 Minimally replicated blocks:    1 (0.37037036 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)<= br>  Default replication factor:    1
 Average block replication:    0.0037037036
 Corrupt blocks:        269
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        1
 Number of racks:        1




Is there any way to recover them ?=

Please help and suggest

Thanks & Regards
yogesh kumar

The information contained in this electronic message and any attachments= to this message are intended for the exclusive use of the addressee(s) and= may contain proprietary, confidential or privileged information. If you are= not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the s= ender immediately and destroy all copies of this message and any attachments= .

WARNING: Computer viruses can be transmitted via email. The recipient sho= uld check this email and any attachments for the presence of viruses. The co= mpany accepts no liability for any damage caused by any virus transmitted by= this email.

www.wipro.com

The information contained in this electronic message and any attachments= to this message are intended for the exclusive use of the addressee(s) and= may contain proprietary, confidential or privileged information. If you are= not the intended recipient, you should not disseminate, distribute or copy= this e-mail. Please notify the sender immediately and destroy all copies of= this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient sho= uld check this email and any attachments for the presence of viruses. The co= mpany accepts no liability for any damage caused by any virus transmitted by= this email.

www.wipro.com

--_000_2ADA1B0170E3434DA763D609DCD01EB54DF3D3D6BLRECMBX7wiproc_--