hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3139) Server shutdown processor stuck because meta not online
Date Thu, 21 Oct 2010 21:04:15 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923636#action_12923636
] 

stack commented on HBASE-3139:
------------------------------

So, I just noticed that doing a startup, there is only one executor fielding open requests
when we've configured the executor to run a  max of 10 executors (I confirmed that the right
config. is going into executor service so something is up where we are not doing thread pool
executor).

> Server shutdown processor stuck because meta not online
> -------------------------------------------------------
>
>                 Key: HBASE-3139
>                 URL: https://issues.apache.org/jira/browse/HBASE-3139
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.90.0
>
>
> Playing with rolling restart I see that the server hosting root and meta can go down
close to each other.  In below, note how we are processing server hosting -ROOT- and part
of its processing involves reading .META. content to see what servers it was carrying.  Well,
note that .META. is offline at time (our verification attempt failed because server had just
been shutdown and verification got ConnectException).  So we pause the server shutdown processing
till .META. comes back online -- only it never does.
> {code}
> 2010-10-21 07:32:23,931 INFO org.apache.hadoop.hbase.catalog.RootLocationEditor: Unsetting
ROOT region location in ZooKeeper
> 2010-10-21 07:32:23,953 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler:
Splitting logs for sv2borg182,60020,1287645693959                                        
                                              2010-10-21 07:32:23,994 INFO org.apache.hadoop.hbase.catalog.RootLocationEditor:
Unsetting ROOT region location in ZooKeeper
> 2010-10-21 07:32:24,020 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x12bcd5b344b0115
Creating (or updating) unassigned node for 70236052 with OFFLINE state                   
                                2010-10-21 07:32:24,045 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
No previous transition plan for -ROOT-,,0.70236052 so generated a random one; hri=-ROOT-,,0.70236052,
src=, dest=sv2borg181,60020,1287646329081; 8 (online=8, exclude=null) available servers  
                                                                                         
                                                                                         
  2010-10-21 07:32:24,045 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning
region -ROOT-,,0.70236052 to sv2borg181,60020,1287646329081                              
                                               2010-10-21 07:32:24,048 INFO org.apache.hadoop.hbase.catalog.CatalogTracker:
Failed verification of .META.,,1; java.net.ConnectException: Connection refused
> 2010-10-21 07:32:24,048 INFO org.apache.hadoop.hbase.catalog.CatalogTracker: Current
cached META location is not valid, resetting
> 2010-10-21 07:32:24,079 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling
transition=RS_ZK_REGION_OPENING, server=sv2borg181,60020,1287646329081, region=70236052/-ROOT-
                                           2010-10-21 07:32:24,162 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Handling transition=RS_ZK_REGION_OPENING, server=sv2borg181,60020,1287646329081, region=70236052/-ROOT-
> 2010-10-21 07:32:24,212 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling
transition=RS_ZK_REGION_OPENED, server=sv2borg181,60020,1287646329081, region=70236052/-ROOT-
                                            2010-10-21 07:32:24,212 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler:
Handling OPENED event for 70236052; deleting unassigned node                             
                                               2010-10-21 07:32:24,212 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
master:60000-0x12bcd5b344b0115 Deleting existing unassigned node for 70236052 that is in expected
state RS_ZK_REGION_OPENED                              2010-10-21 07:32:24,238 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler:
Opened region -ROOT-,,0.70236052                                                         
                                               2010-10-21 07:32:27,902 INFO org.apache.hadoop.hbase.master.ServerManager:
Registering server=sv2borg183,60020,1287646347597, regionCount=0, userLoad=false         
                                                              2010-10-21 07:32:30,523 INFO
org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
processing expiration [sv2borg184,60020,1287645693960]                                   
                2010-10-21 07:32:30,523 DEBUG org.apache.hadoop.hbase.master.ServerManager:
Added=sv2borg184,60020,1287645693960 to dead servers, submitted shutdown handler to be executed
                                                       2010-10-21 07:32:36,254 INFO org.apache.hadoop.hbase.master.ServerManager:
Registering server=sv2borg184,60020,1287646355951, regionCount=0, userLoad=false         
                                                              2010-10-21 07:32:39,567 INFO
org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
processing expiration [sv2borg185,60020,1287645693959]                                   
                2010-10-21 07:32:39,567 DEBUG org.apache.hadoop.hbase.master.ServerManager:
Added=sv2borg185,60020,1287645693959 to dead servers, submitted shutdown handler to be executed
                                                       2010-10-21 07:32:45,614 INFO org.apache.hadoop.hbase.master.ServerManager:
Registering server=sv2borg185,60020,1287646365304, regionCount=0, userLoad=false         
                                                              2010-10-21 07:32:48,652 INFO
org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
processing expiration [sv2borg186,60020,1287645693962]                                   
                2010-10-21 07:32:48,652 DEBUG org.apache.hadoop.hbase.master.ServerManager:
Added=sv2borg186,60020,1287645693962 to dead servers, submitted shutdown handler to be executed
                                                       2010-10-21 07:32:50,097 INFO org.apache.hadoop.hbase.master.ServerManager:
regionservers=8, averageload=93.38, deadservers=[sv2borg185,60020,1287645693959, sv2borg183,60020,1287645693959,
sv2borg182,60020,1287645693959,        sv2borg184,60020,1287645693960, sv2borg186,60020,1287645693962]
> ....
> {code}
> We're supposed to have a thread of 5 executors to handle server shutdowns.  I see an
executor stuck waiting on .META. but I dont see any others running.  Odd.  Trying to figure
why executors are 1 only.
> {code}
> "MASTER_SERVER_OPERATIONS-sv2borg180:60000-1" daemon prio=10 tid=0x0000000041dc7000 nid=0x50a4
in Object.wait() [0x00007f285d537000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>     at java.lang.Object.wait(Native Method)
>     at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:324)
>     - locked <0x00007f286d150ce8> (a java.util.concurrent.atomic.AtomicBoolean)
>     at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:359)
>     at org.apache.hadoop.hbase.catalog.MetaReader.getServerUserRegions(MetaReader.java:487)
>     at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:115)
>     at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:150)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message