hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henning Blohm <henning.bl...@zfabrik.de>
Subject Curious: Corrupted HDFS self-healing?
Date Tue, 17 May 2016 14:24:48 GMT
Hi all,

after some 20h loading of data into Hbase (v1.0 on Hadoop 2.6.0), single 
node, I noticed that Hadoop reported a corrupt file system. It says:

Status: CORRUPT
   CORRUPT FILES:    1
   CORRUPT BLOCKS:     1
The filesystem under path '/' is CORRUPT


and checking the details it says:

---
FSCK started by hb (auth:SIMPLE) from /127.0.0.1 for path 
/hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/d/d23252e7c0854b6093e6468acf2dad38

at Tue May 17 15:54:03 CEST 2016
/hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/d/d23252e7c0854b6093e6468acf2dad38

2740218577 bytes, 11 block(s):
/hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/d/d23252e7c0854b6093e6468acf2dad38:

CORRUPT blockpool BP-130837870-192.168.178.29-1462900512452 block 
blk_1073746166
  MISSING 1 blocks of total size 268435456 B
0. BP-130837870-192.168.178.29-1462900512452:blk_1073746164_5344 
len=268435456 Live_repl=1 
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
1. BP-130837870-192.168.178.29-1462900512452:blk_1073746165_5345 
len=268435456 Live_repl=1 
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
2. BP-130837870-192.168.178.29-1462900512452:blk_1073746166_5346 
len=268435456 MISSING!
3. BP-130837870-192.168.178.29-1462900512452:blk_1073746167_5347 
len=268435456 Live_repl=1 
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
4. BP-130837870-192.168.178.29-1462900512452:blk_1073746168_5348 
len=268435456 Live_repl=1 
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
5. BP-130837870-192.168.178.29-1462900512452:blk_1073746169_5349 
len=268435456 Live_repl=1 
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
6. BP-130837870-192.168.178.29-1462900512452:blk_1073746170_5350 
len=268435456 Live_repl=1 
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
7. BP-130837870-192.168.178.29-1462900512452:blk_1073746171_5351 
len=268435456 Live_repl=1 
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
8. BP-130837870-192.168.178.29-1462900512452:blk_1073746172_5352 
len=268435456 Live_repl=1 
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
9. BP-130837870-192.168.178.29-1462900512452:blk_1073746173_5353 
len=268435456 Live_repl=1 
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
10. BP-130837870-192.168.178.29-1462900512452:blk_1073746174_5354 
len=55864017 Live_repl=1 
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
---

(note 2.)

I did not try to repair using fsck. Instead restarting the node made 
this problem go away:

---
FSCK started by hb (auth:SIMPLE) from /127.0.0.1 for path 
/hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/d/d23252e7c0854b6093e6468acf2dad38

at Tue May 17 16:10:52 CEST 2016
/hbase/data/default/tt_items/08255086d13380bd559a87dd93cc15ba/d/d23252e7c0854b6093e6468acf2dad38

2740218577 bytes, 11 block(s):  OK
0. BP-130837870-192.168.178.29-1462900512452:blk_1073746164_5344 
len=268435456 Live_repl=1 
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
1. BP-130837870-192.168.178.29-1462900512452:blk_1073746165_5345 
len=268435456 Live_repl=1 
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
2. BP-130837870-192.168.178.29-1462900512452:blk_1073746166_5346 
len=268435456 Live_repl=1 
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
3. BP-130837870-192.168.178.29-1462900512452:blk_1073746167_5347 
len=268435456 Live_repl=1 
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
4. BP-130837870-192.168.178.29-1462900512452:blk_1073746168_5348 
len=268435456 Live_repl=1 
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
5. BP-130837870-192.168.178.29-1462900512452:blk_1073746169_5349 
len=268435456 Live_repl=1 
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
6. BP-130837870-192.168.178.29-1462900512452:blk_1073746170_5350 
len=268435456 Live_repl=1 
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
7. BP-130837870-192.168.178.29-1462900512452:blk_1073746171_5351 
len=268435456 Live_repl=1 
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
8. BP-130837870-192.168.178.29-1462900512452:blk_1073746172_5352 
len=268435456 Live_repl=1 
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
9. BP-130837870-192.168.178.29-1462900512452:blk_1073746173_5353 
len=268435456 Live_repl=1 
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]
10. BP-130837870-192.168.178.29-1462900512452:blk_1073746174_5354 
len=55864017 Live_repl=1 
[DatanodeInfoWithStorage[127.0.0.1:50010,DS-9cc4b81b-dbe3-4da1-a394-9ca30db55017,DISK]]

Status: HEALTHY
---

I guess that means that the datanode reported the missing block now.

How is that possible? Is that an acceptable, expectable behavior?

Is there anything I can do to prevent this sort of problem?

Here is my hdfs config (substitute ${nosql.home} with the installation 
folder and ${nosql.master} with localhost):

Any clarification would be great!

Thanks!
Henning

---
<configuration>

     <property>
         <name>dfs.replication</name>
         <value>1</value>
     </property>

     <property>
         <name>dfs.namenode.name.dir</name>
         <value>file://${nosql.home}/data/name</value>
     </property>

     <property>
         <name>dfs.datanode.data.dir</name>
         <value>file://${nosql.home}/data/data</value>
     </property>


     <property>
         <name>dfs.datanode.max.transfer.threads</name>
         <value>4096</value>
     </property>

     <property>
         <name>dfs.support.append</name>
         <value>true</value>
     </property>

     <property>
         <name>dfs.datanode.synconclose</name>
         <value>true</value>
     </property>

     <property>
         <name>dfs.datanode.sync.behind.writes</name>
         <value>true</value>
     </property>

     <property>
<name>dfs.namenode.avoid.read.stale.datanode</name>
         <value>true</value>
     </property>

     <property>
<name>dfs.namenode.avoid.write.stale.datanode</name>
         <value>true</value>
     </property>

     <property>
<name>dfs.namenode.stale.datanode.interval</name>
         <value>3000</value>
     </property>

     <!--
       <property>
         <name>dfs.client.read.shortcircuit</name>
         <value>true</value>
     </property>

     <property>
         <name>dfs.domain.socket.path</name>
         <value>/var/lib/seritrack/dn_socket</value>
     </property>

     <property>
<name>dfs.client.read.shortcircuit.buffer.size</name>
         <value>131072</value>
     </property>
     -->

     <property>
         <name>dfs.block.size</name>
         <value>268435456</value>
     </property>

     <property>
         <name>ipc.server.tcpnodelay</name>
         <value>true</value>
     </property>

     <property>
         <name>ipc.client.tcpnodelay</name>
         <value>true</value>
     </property>

     <property>
         <name>dfs.datanode.max.xcievers</name>
         <value>4096</value>
     </property>

     <property>
         <name>dfs.namenode.handler.count</name>
         <value>64</value>
     </property>

     <property>
         <name>dfs.datanode.handler.count</name>
         <value>8</value>
     </property>

</configuration>
---



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org


Mime
View raw message