hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: HBase Region always in transition + corrupt HDFS
Date Mon, 23 Feb 2015 18:32:57 GMT
You have no other choice than removing those files... you will loose the
related data but it should be fine if they are only HFiles. Do you have the
list of corrupted files? What kind of files it is?

Also, have you lost a node or a disk? How have you lost about 150 blocks?

JM

2015-02-23 2:47 GMT-05:00 Arinto Murdopo <arinto@gmail.com>:

> Hi all,
>
> We're running HBase (0.94.15-cdh4.6.0) on top of HDFS (Hadoop
> 2.0.0-cdh4.6.0).
> For all of our tables, we set the replication factor to 1 (dfs.replication
> = 1 in hbase-site.xml). We set to 1 because we want to minimize the HDFS
> usage (now we realize we should set this value to at least 2, because
> "failure is a norm" in distributed systems).
>
> Due to the amount of data, at some point, we have low disk space in HDFS
> and one of our DNs was down. Now we have these problems in HBase and HDFS
> although we have recovered our DN.
>
> *Issue#1*. Some of HBase region always in transition. '*hbase hbck
> -repair*'
> is stuck because it's waiting for region transition to finish. Some output
>
> *hbase(main):003:0> status 'detailed'*
> *12 regionsInTransition*
> *
>
> plr_id_insta_media_live,\x02:;6;7;398962:3:399a49:653:64,1421565172917.1528f288473632aca2636443574a6ba1.
> state=OPENING, ts=1424227696897, server=null*
> *
>
> plr_sg_insta_media_live,\x0098;522:997;8798665a64;67879,1410768824800.2c79bbc5c0dc2d2b39c04c8abc0a90ff.
> state=OFFLINE, ts=1424227714203, server=null*
> *
>
> plr_sg_insta_media_live,\x00465892:9935773828;a4459;649,1410767723471.55097cfc60bc9f50303dadb02abcd64b.
> state=OPENING, ts=1424227701234, server=null*
> *
>
> plr_sg_insta_media_live,\x00474973488232837733a38744,1410767723471.740d6655afb74a2ff421c6ef16037f57.
> state=OPENING, ts=1424227708053, server=null*
> *
>
> plr_id_insta_media_live,\x02::449::4;:466;3988a6432677;3,1419435100617.7caf3d749dce37037eec9ccc29d272a1.
> state=OPENING, ts=1424227701484, server=null*
> *
>
> plr_sg_insta_media_live,\x05779793546323;::4:4a3:8227928,1418845792479.81c4da129ae5b7b204d5373d9e0fea3d.
> state=OPENING, ts=1424227705353, server=null*
> *
>
> plr_sg_insta_media_live,\x009;5:686348963:33:5a5634887,1410769837567.8a9ded24960a7787ca016e2073b24151.
> state=OPENING, ts=1424227706293, server=null*
> *
>
> plr_sg_insta_media_live,\x0375;6;7377578;84226a7663792,1418980694076.a1e1c98f646ee899010f19a9c693c67c.
> state=OPENING, ts=1424227680569, server=null*
> *
>
> plr_sg_insta_media_live,\x018;3826368274679364a3;;73457;,1421425643816.b04ffda1b2024bac09c9e6246fb7b183.
> state=OPENING, ts=1424227680538, server=null*
> *
>
> plr_sg_insta_media_live,\x0154752;22:43377542:a:86:239,1410771044924.c57d6b4d23f21d3e914a91721a99ce12.
> state=OPENING, ts=1424227710847, server=null*
> *
>
> plr_sg_insta_media_live,\x0069;7;9384697:;8685a885485:,1410767928822.c7b5e53cdd9e1007117bcaa199b30d1c.
> state=OPENING, ts=1424227700962, server=null*
> *
>
> plr_sg_insta_media_live,\x04994537646:78233569a3467:987;7,1410787903804.cd49ec64a0a417aa11949c2bc2d3df6e.
> state=OPENING, ts=1424227691774, server=null*
>
>
> *Issue#2*. The next step that we do is to check HDFS file status using
> '*hdfs
> fsck /*'. It shows that the filesystem '/' is corrupted with these
> statistics
> * Total size:    15494284950796 B (Total open files size: 17179869184 B)*
> * Total dirs:    9198*
> * Total files:   124685 (Files currently being written: 21)*
> * Total blocks (validated):      219620 (avg. block size 70550427 B) (Total
> open file blocks (not validated): 144)*
> *  *********************************
> *  CORRUPT FILES:        42*
> *  MISSING BLOCKS:       142*
> *  MISSING SIZE:         14899184084 B*
> *  CORRUPT BLOCKS:       142*
> *  *********************************
> * Corrupt blocks:                142*
> * Number of data-nodes:          14*
> * Number of racks:               1*
> *FSCK ended at Tue Feb 17 17:25:18 SGT 2015 in 3026 milliseconds*
>
>
> *The filesystem under path '/' is CORRUPT*
>
> So it seems that HDFS loses some of its block due to DN failures and since
> the dfs.replication factor is 1, it could not recover the missing blocks.
>
> *Issue#3*. Although '*hbase hbck -repair*' is stuck, we are able to run
> '*hbase
> hbck -fixHdfsHoles*'. We notice this following error messages (I copied
> some of them to represent each type of error messages that we have).
> - *ERROR: Region { meta =>
>
> plr_id_insta_media_live,\x02:;6;7;398962:3:399a49:653:64,1421565172917.1528f288473632aca2636443574a6ba1.,
> hdfs => hdfs://nameservice1/hbase/plr_id_insta_media_live/1528f2884*
> *73632aca2636443574a6ba1, deployed =>  } not deployed on any region
> server.*
> - *ERROR: Region { meta => null, hdfs =>
>
> hdfs://nameservice1/hbase/plr_sg_insta_media_live/8473d25be5980c169bff13cf90229939,
> deployed =>  } on HDFS, but not listed in META or deployed on any region
> server*
> *- ERROR: Region { meta =>
>
> plr_sg_insta_media_live,\x0293:729769;975376;2a33995622;3,1421985489851.8819ebd296f075513056be4bbd30ee9c.,
> hdfs => null, deployed =>  } found in META, but not in HDFS or deployed on
> any region server.*
> -ERROR: There is a hole in the region chain between
> \x099599464:7:5;3595;8a:57868;95 and \x099;56535:4632439643a82826562:.  You
> need to create a new .regioninfo and region dir in hdfs to plug the hole.
> -ERROR: Last region should end with an empty key. You need to create a new
> region and regioninfo in HDFS to plug the hole.
>
> Now to fix this issue, we plan to perform this following action items:
>
>    1. Move or delete corrupted files in HDFS
>    2. Repair HBase by deleting the reference of corrupted files/blocks from
>    HBase meta tables (it’s okay to lost some of the data)
>    3. Or create empty HFiles as shown in
>    http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/31308
>
>
> And our questions are:
>
>    1. Is it safe to move or delete corrupted files in HDFS? Can we make
>    HBase to ignore those files and delete corresponding HBase files?
>    2. Any comments on our action items?
>
>
> Best regards,
>
> Arinto
> www.otnira.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message