hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajeshbabu Chintaguntla (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-12791) HBase does not attempt to clean up an aborted split when the regionserver shutting down
Date Wed, 07 Jan 2015 20:56:35 GMT

     [ https://issues.apache.org/jira/browse/HBASE-12791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Rajeshbabu Chintaguntla updated HBASE-12791:
    Attachment: HBASE-12791_v3.patch

Thanks for the review. Here is the updated patch.
bq. Can we move the cleaning logic away from RegionStates though (ideally to SSH, or to a
utility method).
We cannot do the cleanup in SSH because in RegionStates#serverOffline we are removing the
transitions of regions need not open. So we don't get the regions in SPLITTING_NEW in SSH.
So I have added utility method in FSUtils to cleanup. 

bq. Can you add logging here:
Added the log here.

bq. The hbck change seems costly. We already have all the regions from hdfs and meta at that
point no?
In the current patch making use of regions info already loaded from meta and hdfs.

Apart from this I have tried to handle cleanup during master startup but not able to identify
the regions in SPLITTING_NEW state because we are not persisting the daughter regions until
unless split commit happen(which is correct only).
In branch-1 and 0.98 not able to identify regions in SPLITTING_NEW state during master startup
because we are delegating dead server handling to SSH which mostly depend on meta and in memory
state(not reading from zk).

Please review.

> HBase does not attempt to clean up an aborted split when the regionserver shutting down
> ---------------------------------------------------------------------------------------
>                 Key: HBASE-12791
>                 URL: https://issues.apache.org/jira/browse/HBASE-12791
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.98.0
>            Reporter: Rajeshbabu Chintaguntla
>            Assignee: Rajeshbabu Chintaguntla
>            Priority: Critical
>             Fix For: 2.0.0, 0.98.10, 1.0.1
>         Attachments: HBASE-12791.patch, HBASE-12791_v2.patch, HBASE-12791_v3.patch
> HBase not cleaning the daughter region directories from HDFS  if region server shut down
after creating the daughter region directories during the split.
> Here the logs.
> -> RS shutdown after creating the daughter regions.
> {code}
> 2014-12-31 09:05:41,406 DEBUG [regionserver60020-splits-1419996941385] zookeeper.ZKAssign:
regionserver:60020-0x14a9701e53100d1, quorum=localhost:2181, baseZNode=/hbase Transitioned
node 80c665138d4fa32da4d792d8ed13206f from RS_ZK_REQUEST_REGION_SPLIT to RS_ZK_REQUEST_REGION_SPLIT
> 2014-12-31 09:05:41,514 DEBUG [regionserver60020-splits-1419996941385] regionserver.HRegion:
Closing t,,1419996880699.80c665138d4fa32da4d792d8ed13206f.: disabling compactions & flushes
> 2014-12-31 09:05:41,514 DEBUG [regionserver60020-splits-1419996941385] regionserver.HRegion:
Updates disabled for region t,,1419996880699.80c665138d4fa32da4d792d8ed13206f.
> 2014-12-31 09:05:41,516 INFO  [StoreCloserThread-t,,1419996880699.80c665138d4fa32da4d792d8ed13206f.-1]
regionserver.HStore: Closed f
> 2014-12-31 09:05:41,518 INFO  [regionserver60020-splits-1419996941385] regionserver.HRegion:
Closed t,,1419996880699.80c665138d4fa32da4d792d8ed13206f.
> 2014-12-31 09:05:49,922 DEBUG [regionserver60020-splits-1419996941385] regionserver.MetricsRegionSourceImpl:
Creating new MetricsRegionSourceImpl for table t dd9731ee43b104da565257ca1539aa8c
> 2014-12-31 09:05:49,922 DEBUG [regionserver60020-splits-1419996941385] regionserver.HRegion:
Instantiated t,,1419996941401.dd9731ee43b104da565257ca1539aa8c.
> 2014-12-31 09:05:49,929 DEBUG [regionserver60020-splits-1419996941385] regionserver.MetricsRegionSourceImpl:
Creating new MetricsRegionSourceImpl for table t 2e40a44511c0e187d357d651f13a1dab
> 2014-12-31 09:05:49,929 DEBUG [regionserver60020-splits-1419996941385] regionserver.HRegion:
Instantiated t,row2,1419996941401.2e40a44511c0e187d357d651f13a1dab.
> Wed Dec 31 09:06:30 IST 2014 Terminating regionserver
> 2014-12-31 09:06:30,465 INFO  [Thread-8] regionserver.ShutdownHook: Shutdown hook starting;
hbase.shutdown.hook=true; fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@42d2282e
> {code}
> -> Skipping rollback if RS stopped or stopping so we end up in dirty daughter regions
in HDFS.
> {code}
> 2014-12-31 09:07:49,547 INFO  [regionserver60020-splits-1419996941385] regionserver.SplitRequest:
Skip rollback/cleanup of failed split of t,,1419996880699.80c665138d4fa32da4d792d8ed13206f.
because server is stopped
> java.io.InterruptedIOException: Interrupted after 0 tries  on 350
>         at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:156)
> {code}
> Because of this hbck always showing inconsistencies. 
> {code}
> ERROR: Region { meta => null, hdfs => hdfs://localhost:9000/hbase/data/default/t/2e40a44511c0e187d357d651f13a1dab,
deployed =>  } on HDFS, but not listed in hbase:meta or deployed on any region server
> ERROR: Region { meta => null, hdfs => hdfs://localhost:9000/hbase/data/default/t/dd9731ee43b104da565257ca1539aa8c,
deployed =>  } on HDFS, but not listed in hbase:meta or deployed on any region server
> {code}
> If we try to repair then we end up in overlap regions in hbase:meta. and both daughter
regions and parent are online.

This message was sent by Atlassian JIRA

View raw message