hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <yogesh.kuma...@wipro.com>
Subject RE: How to do HADOOP RECOVERY ???
Date Mon, 29 Oct 2012 12:13:32 GMT
Hi Uma,

You are correct, when I start cluster it goes into safemode and if I do wait its doesn't come
out.
I use  -safemode leave option.

Safe mode is ON. The ratio of reported blocks 0.0037 has not reached the threshold 0.9990.
Safe mode will be turned off automatically.
379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%)


When I start fresh cluster mean..

I have saved the fs.name.dir and fs.data.dir seprately for back-up of old cluster(single node).
and used old machine and new machine to start new cluster ( old machine acted as DN and newly
added machine was acted as NN+DN). and at the same time I have given different directory location
for dfs.name.dir and dfs.data.dir on old machine

Say when it was single node
 dfs.name.dir -->  /HADOOP/SINGLENODE/Name_Dir   && dfs.data,dir  --> /HADOOP/SINGLENODE/Data_Dir

when I used it with another machine as D.N
dfs.name.dir --> /HADOOP/MULTINODE/Name_Dir  && dfs.data.dir --> /HADOOP/MULTINODE/Data_Dir



Now I get back to previous stage.

Old Machine as single Node Cluster (NN + DN)
and gave the path for dfs.name.dir && dfs.data.dir (  dfs.name.dir -->  /HADOOP/SINGLENODE/Name_Dir
  && dfs.data,dir  --> /HADOOP/SINGLENODE/Data_Dir)

I have saved namespace and data before configuring the multi node cluster with new machine.



It should work after giving the name space and data directory path in conf files of single
node machine and should show the previous content, or I am wrong ??

Why Its it happening, and why is it not cumming from safe mode by itself

Please suggest

Regards
Yogesh Kumar
________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, October 29, 2012 5:10 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???


I am not sure, I understood your scenario correctly here. Here is one possibility for this
situation with your explained case.



>>I have saved the dfs.name.dir seprately, and started with fresh cluster...
  When you start fresh cluster, have you used same DNs? if so, blocks will be invalidated
as your name space is fresh now(infact it can not register untill you clean the data dirs
in DN as namespace id differs).

  Now, you are keeping the older image back and starting again. So, your older image will
expect the enough blocks to be reported from DNs to start. Otherwise it will be in safe mode.
How it is coming out of safemode?



or if you continue with the same cluster  and additionally you saved the namespace separately
as a backup the current state, then added extra DN to the cluster refering as fresh cluster?

 In this case, if you delete any existing files, data blocks will be invalidated in DN.

 After this if you go back to older cluster with the backedup namespace, this deleted files
infomation will not be known by by older image and it will expect the blocks to be report
and if not blocks available for a file then that will be treated as corrupt.

>>I did -ls / operation and got this exception


>>mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
>>Found 1 items

ls will show because namespace has this info for this file. But DNs does not have any block
related to it.

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 4:13 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???

Thanks Uma,

I am using hadoop-0.20.2 version.

UI shows.
Cluster Summary
379 files and directories, 270 blocks = 649 total. Heap Size is 81.06 MB / 991.69 MB (8%)

WARNING : There are about 270 missing blocks. Please check the log or run fsck.

Configured Capacity     :       465.44 GB
DFS Used        :       20 KB
Non DFS Used    :       439.37 GB
DFS Remaining   :       26.07 GB
DFS Used%       :       0 %
DFS Remaining%  :       5.6 %
Live Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=LIVE>       :       1
Dead Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=DEAD>       :       0


Firstly I have configured single node cluster and worked over it, after that I have added
another machine and made another one as a master + worker and the fist machine as a worker
only.

I have saved the dfs.name.dir seprately, and started with fresh cluster...

Now I have switched back to previous stage with single node with same old machine having single
node cluster.
I have given the path for dfs.name.dir where I have kept that.

Now I am running and getting this.

I did -ls / operation and got this exception


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -ls /user/hive/warehouse/vw_cc/
Found 1 items

-rw-r--r--   1 mediaadmin supergroup       1774 2012-10-17 16:15 /user/hive/warehouse/vw_cc/000000_0


mediaadmins-iMac-2:haadoop-0.20.2 mediaadmin$ HADOOP dfs -cat /user/hive/warehouse/vw_cc/000000_0


12/10/29 16:01:15 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595
file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:15 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595
from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:18 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595
file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:18 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595
from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:21 INFO hdfs.DFSClient: No node available for block: blk_-1280621588594166706_3595
file=/user/hive/warehouse/vw_cc/000000_0
12/10/29 16:01:21 INFO hdfs.DFSClient: Could not obtain block blk_-1280621588594166706_3595
from any node:  java.io.IOException: No live nodes contain current block
12/10/29 16:01:24 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could not obtain block:
blk_-1280621588594166706_3595 file=/user/hive/warehouse/vw_cc/000000_0
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1812)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767)
    at java.io.DataInputStream.read(DataInputStream.java:83)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
    at org.apache.hadoop.fs.FsShell.printToStdout(FsShell.java:114)
    at org.apache.hadoop.fs.FsShell.access$100(FsShell.java:49)
    at org.apache.hadoop.fs.FsShell$1.process(FsShell.java:352)
    at org.apache.hadoop.fs.FsShell$DelayedExceptionThrowing.globAndProcess(FsShell.java:1898)
    at org.apache.hadoop.fs.FsShell.cat(FsShell.java:346)


I looked at NN Logs for one of the file..

it showing

2012-10-29 15:26:02,560 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=null
   ip=null    cmd=open    src=/user/hive/warehouse/vw_cc/000000_0    dst=null    perm=null
.
.
.
.

Please suggest

Regards
Yogesh Kumar



________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, October 29, 2012 3:52 PM
To: user@hadoop.apache.org
Subject: RE: How to do HADOOP RECOVERY ???


Which version of Hadoop are you using?



Do you have all DNs running? can you check UI report, wehther all DN are a live?

Can you check the DN disks are good or not?

Can you grep the NN and DN logs with one of the corrupt blockID from below?



Regards,

Uma

________________________________
From: yogesh.kumar13@wipro.com [yogesh.kumar13@wipro.com]
Sent: Monday, October 29, 2012 2:03 PM
To: user@hadoop.apache.org
Subject: How to do HADOOP RECOVERY ???

Hi All,

I run this command

hadoop fsck -Ddfs.http.address=localhost:50070 /

and found that some blocks are missing and corrupted

results comes like..

/user/hive/warehouse/tt_report_htcount/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/hive/warehouse/tt_report_perhour_hit/000000_0: CORRUPT block blk_75438572351073797

/user/hive/warehouse/tt_report_perhour_hit/000000_0: MISSING 1 blocks of total size 1531 B..
/user/hive/warehouse/vw_cc/000000_0: CORRUPT block blk_-1280621588594166706

/user/hive/warehouse/vw_cc/000000_0: MISSING 1 blocks of total size 1774 B..
/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_8637186139854977656

/user/hive/warehouse/vw_report2/000000_0: CORRUPT block blk_4019541597438638886

/user/hive/warehouse/vw_report2/000000_0: MISSING 2 blocks of total size 71826120 B..
/user/zoo/foo.har/_index: CORRUPT block blk_3404803591387558276
.
.
.
.
.

Total size:    7600625746 B
 Total dirs:    205
 Total files:    173
 Total blocks (validated):    270 (avg. block size 28150465 B)
  ********************************
  CORRUPT FILES:    171
  MISSING BLOCKS:    269
  MISSING SIZE:        7600625742 B
  CORRUPT BLOCKS:     269
  ********************************
 Minimally replicated blocks:    1 (0.37037036 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    1
 Average block replication:    0.0037037036
 Corrupt blocks:        269
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        1
 Number of racks:        1




Is there any way to recover them ?

Please help and suggest

Thanks & Regards
yogesh kumar

The information contained in this electronic message and any attachments to this message are
intended for the exclusive use of the addressee(s) and may contain proprietary, confidential
or privileged information. If you are not the intended recipient, you should not disseminate,
distribute or copy this e-mail. Please notify the sender immediately and destroy all copies
of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email
and any attachments for the presence of viruses. The company accepts no liability for any
damage caused by any virus transmitted by this email.

www.wipro.com

The information contained in this electronic message and any attachments to this message are
intended for the exclusive use of the addressee(s) and may contain proprietary, confidential
or privileged information. If you are not the intended recipient, you should not disseminate,
distribute or copy this e-mail. Please notify the sender immediately and destroy all copies
of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email
and any attachments for the presence of viruses. The company accepts no liability for any
damage caused by any virus transmitted by this email.

www.wipro.com

Please do not print this email unless it is absolutely necessary. 

The information contained in this electronic message and any attachments to this message are
intended for the exclusive use of the addressee(s) and may contain proprietary, confidential
or privileged information. If you are not the intended recipient, you should not disseminate,
distribute or copy this e-mail. Please notify the sender immediately and destroy all copies
of this message and any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should check this email
and any attachments for the presence of viruses. The company accepts no liability for any
damage caused by any virus transmitted by this email. 

www.wipro.com

Mime
View raw message