hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ramkrishna.S.Vasudevan" <ramkrishna.vasude...@huawei.com>
Subject RE: Wondering what hbck should do in this situation
Date Thu, 19 Jul 2012 04:51:14 GMT
J-D

Just going thro the explanation I feel that the region that had references
is a parent region and it should have an entry in META saying it is SPLIT
and OFFLINE?

May be while fixing those cases where we find something in HDFS and not in
META we may need see if it is splitted? 

Was there any reason why the CatalogJanitor was not able to pick this region
for clean up.  

I may be wrong here JD, just going thro the explanation am thinking this
could be the scenario.

Thanks for bringing this up, would add this to our internal testing also.

Regards
Ram


> -----Original Message-----
> From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-
> Daniel Cryans
> Sent: Wednesday, July 18, 2012 9:23 PM
> To: dev@hbase.apache.org
> Subject: Wondering what hbck should do in this situation
> 
> Hey devs,
> 
> I encountered an "interesting" situation with hbck in 0.94, we had
> this region which was on HDFS that wasn't in .META. and hbck decided
> to include it back:
> 
> ERROR: Region { meta => null, hdfs =>
> hdfs://sfor3s24:10101/hbase/url_stumble_summary/159952764, deployed =>
>  } on HDFS, but not listed in META or deployed on any region server
> 12/07/17 23:46:03 INFO util.HBaseFsck: Patching .META. with
> .regioninfo: {NAME =>
> 'url_stumble_summary,25467315:2009-12-28,1271922074820', STARTKEY =>
> '25467315:2009-12-28', ENDKEY => '25821137:2010-03-08', ENCODED =>
> 159952764,}
> 
> Then when it tried to assign the region it got bounced between region
> servers:
> 
> Trying to reassign region...
> 12/07/17 23:46:04 INFO util.HBaseFsckRepair: Region still in
> transition, waiting for it to become assigned: {NAME =>
> 'url_stumble_summary,25467315:2009-12-28,1271922074820', STARTKEY =>
> '25467315:2009-12-28', ENDKEY => '25821137:2010-03-08', ENCODED =>
> 159952764,}
> 12/07/17 23:46:05 INFO util.HBaseFsckRepair: Region still in
> transition, waiting for it to become assigned: {NAME =>
> 'url_stumble_summary,25467315:2009-12-28,1271922074820', STARTKEY =>
> '25467315:2009-12-28', ENDKEY => '25821137:2010-03-08', ENCODED =>
> 159952764,}
> etc
> 
> Turns out that this region only contained references (as in post-split
> references) to a region that didn't exist anymore so when the region
> was being opened it was failing on opening those referenced files:
> 
> 2012-07-18 00:00:27,454 ERROR
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed
> open of region=url_stumble_summary,25467315:2009-12-
> 28,1271922074820.159952764,
> starting to roll back the global memstore size.
> java.io.IOException: java.io.IOException:
> java.io.FileNotFoundException: File does not exist:
> /hbase/url_stumble_summary/208247386/default/2354161894779228084
> 	at
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(
> HRegion.java:550)
> 	at
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:46
> 3)
> 	at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3
> 729)
> 	at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3
> 677)
> ...
> Caused by: java.io.IOException: java.io.FileNotFoundException: File
> does not exist:
> /hbase/url_stumble_summary/208247386/default/2354161894779228084
> 	at
> org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:40
> 5)
> 	at
> org.apache.hadoop.hbase.regionserver.Store.<init>(Store.java:258)
> 	at
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.
> java:2918)
> ...
> Caused by: java.io.FileNotFoundException: File does not exist:
> /hbase/url_stumble_summary/208247386/default/2354161894779228084
> 	at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java
> :1822)
> 	at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1
> 813)
> 	at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:544)
> 	at
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem
> .java:187)
> 	at
> org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:102)
> 	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456)
> 	at
> org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.j
> ava:547)
> 	at
> org.apache.hadoop.hbase.regionserver.StoreFile$Reader.<init>(StoreFile.
> java:1252)
> 	at
> org.apache.hadoop.hbase.io.HalfStoreFileReader.<init>(HalfStoreFileRead
> er.java:66)
> ...
> 
> 
> At first it was confusing me why it was looking for another region
> until I saw the HalfStoreFileReader :)
> 
> So this is a case where hbck made the cluster worse because the only
> way to get rid of this region is to force unassign it, delete it from
> .META. and then possibly also delete it from HDFS.
> 
> I'm wondering how this could be done better, should we do more checks
> when including that sort of region? Like, at least make sure we can
> open it? And then what? Just report it?
> 
> Thx for reading this far,
> 
> J-D


Mime
View raw message