hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ramkrishna.s.vasudevan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6329) Stop META regionserver when splitting region could cause daughter region assign twice
Date Thu, 05 Jul 2012 05:32:34 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406858#comment-13406858
] 

ramkrishna.s.vasudevan commented on HBASE-6329:
-----------------------------------------------

Nice one.
One question here 
{code}
    // Interrupt catalog tracker here in case any regions being opened out in
    // handlers are stuck waiting on meta or root.
if (this.catalogTracker != null) this.catalogTracker.stop();
{code}
This does not impact the thread that is trying to write into META thro SplitTransaction?

May be we can add one check like if RS already aborting do not call abort/stop.  This is because
some times in the above case if META writing fails we will get a PONR and thro PONR we will
call server.abort.  Now already there is an abort going on and one more abort will be called.
Not sure of the implications if both go on at the same time.
                
> Stop META regionserver when splitting region could cause daughter region assign twice
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-6329
>                 URL: https://issues.apache.org/jira/browse/HBASE-6329
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.94.0
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>         Attachments: HBASE-6329v1.patch
>
>
> We found this issue in 0.94, first let me describe the caseļ¼š
> Stop META rs when split is in progress
> 1.Stopping META rs(Server A).
> 2.The main thread of rs close ZK and delete ephemeral node of the rs.
> 3.SplitTransaction is retring MetaEditor.addDaughter
> 4.Master's ServerShutdownHandler process the above dead META server
> 5.Master fixup daughter and assign the daughter
> 6.The daughter is opened on another server(Server B)
> 7.Server A's splitTransaction successfully add the daughter to .META. with serverName=Server
A
> 8.Now, in the .META., daughter's region location is Server A but it is onlined on Server
B
> 9.Restart Master, and master will assign the daughter again.
> Attaching the logs, daughter region 80f999ea84cb259e20e9a228546f6c8a
> Master log:
> 2012-07-04 13:45:56,493 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler:
Splitting logs for dw93.kgb.sqa.cm4,60020,1341378224464
> 2012-07-04 13:45:58,983 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler:
Fixup; missing daughter writetest,JC\xCA\xC8\xCF<Q\xC49>OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.

> 2012-07-04 13:45:58,985 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added daughter
writetest,JC\xCA\xC8\xCF<Q\xC49>OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
serverName=null 
> 2012-07-04 13:45:58,988 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning
region writetest,JC\xCA\xC8\xCF<Q\xC49>OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
to dw88.kgb.sqa.cm4,60020,1341379188777 
> 2012-07-04 13:46:00,201 INFO org.apache.hadoop.hbase.master.AssignmentManager: The master
has opened the region writetest,JC\xCA\xC8\xCF<Q\xC49>OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
that was online on dw88.kgb.sqa.cm4,60020,1341379188777 
> Master log after restart:
> 2012-07-04 14:27:05,824 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x136187d60e34644
Creating (or updating) unassigned node for 80f999ea84cb259e20e9a228546f6c8a with OFFLINE state

> 2012-07-04 14:27:05,851 INFO org.apache.hadoop.hbase.master.AssignmentManager: Processing
region writetest,JC\xCA\xC8\xCF<Q\xC49>OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
in state M_ZK_REGION_OFFLINE 
> 2012-07-04 14:27:05,854 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning
region writetest,JC\xCA\xC8\xCF<Q\xC49>OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.
to dw93.kgb.sqa.cm4,60020,1341380812020 
> 2012-07-04 14:27:06,051 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling
transition=RS_ZK_REGION_OPENED, server=dw93.kgb.sqa.cm4,60020,1341380812020, region=80f999ea84cb259e20e9a228546f6c8a

> Regionserver(META rs) log:
> 2012-07-04 13:45:56,491 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping
server dw93.kgb.sqa.cm4,60020,1341378224464; zookeeper connection c
> losed.
> 2012-07-04 13:46:11,951 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Added daughter
writetest,JC\xCA\xC8\xCF<Q\xC49>OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
serverName=dw93.kgb.sqa.cm4,60020,1341378224464 
> 2012-07-04 13:46:11,952 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Done
with post open deploy task for region=writetest,JC\xCA\xC8\xCF<Q\xC49>OH\xCEV\xCC\xC2\xB5\xC2@\xD4,1341380730558.80f999ea84cb259e20e9a228546f6c8a.,
daughter=true 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message