hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Margus Roo <mar...@roo.ee>
Subject Manually deleted blocks from datanodes
Date Thu, 08 Jan 2015 08:52:51 GMT

I have simple HDFS setup: 1nn and 2dn.

I created file and added it into HDFS.
About file:
-bash-4.1$ hdfs fsck -blocks -locations -files /user/margusja/file2.txt
Connecting to namenode via http://nn:50070
FSCK started by hdfs (auth:SIMPLE) from / for path 
/user/margusja/file2.txt at Thu Jan 08 10:34:13 EET 2015
/user/margusja/file2.txt 409600000 bytes, 4 block(s):  OK
0. BP-808850907- 
len=134217728 repl=2 [,]
1. BP-808850907- 
len=134217728 repl=2 [,]
2. BP-808850907- 
len=134217728 repl=2 [,]
3. BP-808850907- 
len=6946816 repl=2 [,]

  Total size:    409600000 B
  Total dirs:    0
  Total files:   1
  Total symlinks:                0
  Total blocks (validated):      4 (avg. block size 102400000 B)
  Minimally replicated blocks:   4 (100.0 %)
  Over-replicated blocks:        0 (0.0 %)
  Under-replicated blocks:       0 (0.0 %)
  Mis-replicated blocks:         0 (0.0 %)
  Default replication factor:    2
  Average block replication:     2.0
  Corrupt blocks:                0
  Missing replicas:              0 (0.0 %)
  Number of data-nodes:          2
  Number of racks:               1
FSCK ended at Thu Jan 08 10:34:13 EET 2015 in 1 milliseconds

The filesystem under path '/user/margusja/file2.txt' is HEALTHY

Now I went into one datanode and just deleted blk_1073741828 and got 
into dn's log:
2015-01-08 10:02:00,994 WARN 
Removed block 1073741828 from memory with missing block file on the disk
2015-01-08 10:02:00,994 WARN 
Deleted a metadata file for the deleted block 

But still hdfs gives me that HDFS is healthy.

I can download the file from HDFS using hdfs dfs -get 
/user/margusja/file2.txt - there are some warnings that block is missing.

Now I went into second dn and deleted blk_1073741828.

Still hdfs fsck in nn gives me information that HDFS is OK.

Of course now I can't get my file anymore using hdfs dfs -get 
/user/margusja/file2.txt because blk_1073741828 is does not exist in dn1 
and da2. But still nn is happy and thinks that HDFS is ok.

I guess I am testing it in wrong way.
Is there best practices how to test HDFS before going live? Steps like 
if somehow one block will be missing or corrupted?

Margus (margusja) Roo
skype: margusja
+372 51 480

View raw message