hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12791) HBase does not attempt to clean up an aborted split when the regionserver shutting down
Date Thu, 08 Jan 2015 00:11:35 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14268553#comment-14268553
] 

Enis Soztutar commented on HBASE-12791:
---------------------------------------

The changes in RegionStates for cleanup seems good.  
For the HBCK change, 
 - We are doing some action without a parameter check (-fixAssignments, -fixMeta, etc). Hbck
running without any parameters should never do any destructive action. Can we do this by adding
a parameter, smt like {{-fixFailedSplitAttempts}}. 
 - Can we save the results after we sort the regions for the table so that we do not repeat
the work for multiple regions in this state. In case there is a large number of regions (100K)
this will save some cycles. 
Other than those it looks good. 

> HBase does not attempt to clean up an aborted split when the regionserver shutting down
> ---------------------------------------------------------------------------------------
>
>                 Key: HBASE-12791
>                 URL: https://issues.apache.org/jira/browse/HBASE-12791
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.98.0
>            Reporter: Rajeshbabu Chintaguntla
>            Assignee: Rajeshbabu Chintaguntla
>            Priority: Critical
>             Fix For: 2.0.0, 0.98.10, 1.0.1
>
>         Attachments: HBASE-12791.patch, HBASE-12791_98.patch, HBASE-12791_branch1.patch,
HBASE-12791_v2.patch, HBASE-12791_v3.patch
>
>
> HBase not cleaning the daughter region directories from HDFS  if region server shut down
after creating the daughter region directories during the split.
> Here the logs.
> -> RS shutdown after creating the daughter regions.
> {code}
> 2014-12-31 09:05:41,406 DEBUG [regionserver60020-splits-1419996941385] zookeeper.ZKAssign:
regionserver:60020-0x14a9701e53100d1, quorum=localhost:2181, baseZNode=/hbase Transitioned
node 80c665138d4fa32da4d792d8ed13206f from RS_ZK_REQUEST_REGION_SPLIT to RS_ZK_REQUEST_REGION_SPLIT
> 2014-12-31 09:05:41,514 DEBUG [regionserver60020-splits-1419996941385] regionserver.HRegion:
Closing t,,1419996880699.80c665138d4fa32da4d792d8ed13206f.: disabling compactions & flushes
> 2014-12-31 09:05:41,514 DEBUG [regionserver60020-splits-1419996941385] regionserver.HRegion:
Updates disabled for region t,,1419996880699.80c665138d4fa32da4d792d8ed13206f.
> 2014-12-31 09:05:41,516 INFO  [StoreCloserThread-t,,1419996880699.80c665138d4fa32da4d792d8ed13206f.-1]
regionserver.HStore: Closed f
> 2014-12-31 09:05:41,518 INFO  [regionserver60020-splits-1419996941385] regionserver.HRegion:
Closed t,,1419996880699.80c665138d4fa32da4d792d8ed13206f.
> 2014-12-31 09:05:49,922 DEBUG [regionserver60020-splits-1419996941385] regionserver.MetricsRegionSourceImpl:
Creating new MetricsRegionSourceImpl for table t dd9731ee43b104da565257ca1539aa8c
> 2014-12-31 09:05:49,922 DEBUG [regionserver60020-splits-1419996941385] regionserver.HRegion:
Instantiated t,,1419996941401.dd9731ee43b104da565257ca1539aa8c.
> 2014-12-31 09:05:49,929 DEBUG [regionserver60020-splits-1419996941385] regionserver.MetricsRegionSourceImpl:
Creating new MetricsRegionSourceImpl for table t 2e40a44511c0e187d357d651f13a1dab
> 2014-12-31 09:05:49,929 DEBUG [regionserver60020-splits-1419996941385] regionserver.HRegion:
Instantiated t,row2,1419996941401.2e40a44511c0e187d357d651f13a1dab.
> Wed Dec 31 09:06:30 IST 2014 Terminating regionserver
> 2014-12-31 09:06:30,465 INFO  [Thread-8] regionserver.ShutdownHook: Shutdown hook starting;
hbase.shutdown.hook=true; fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@42d2282e
> {code}
> -> Skipping rollback if RS stopped or stopping so we end up in dirty daughter regions
in HDFS.
> {code}
> 2014-12-31 09:07:49,547 INFO  [regionserver60020-splits-1419996941385] regionserver.SplitRequest:
Skip rollback/cleanup of failed split of t,,1419996880699.80c665138d4fa32da4d792d8ed13206f.
because server is stopped
> java.io.InterruptedIOException: Interrupted after 0 tries  on 350
>         at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:156)
> {code}
> Because of this hbck always showing inconsistencies. 
> {code}
> ERROR: Region { meta => null, hdfs => hdfs://localhost:9000/hbase/data/default/t/2e40a44511c0e187d357d651f13a1dab,
deployed =>  } on HDFS, but not listed in hbase:meta or deployed on any region server
> ERROR: Region { meta => null, hdfs => hdfs://localhost:9000/hbase/data/default/t/dd9731ee43b104da565257ca1539aa8c,
deployed =>  } on HDFS, but not listed in hbase:meta or deployed on any region server
> {code}
> If we try to repair then we end up in overlap regions in hbase:meta. and both daughter
regions and parent are online.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message