hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4395) EnableTableHandler races with itself
Date Wed, 14 Sep 2011 20:06:08 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104842#comment-13104842
] 

Ted Yu commented on HBASE-4395:
-------------------------------

+1 on patch version 2.

> EnableTableHandler races with itself
> ------------------------------------
>
>                 Key: HBASE-4395
>                 URL: https://issues.apache.org/jira/browse/HBASE-4395
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.90.5
>
>         Attachments: HBASE-4395-0.90-v2.patch, HBASE-4395-0.90.patch
>
>
> Very often when we try to enable a big table we get something like:
> {quote}
> 2011-09-02 12:21:56,619 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected state
trying to OFFLINE; huge_ass_region_name state=PENDING_OPEN, ts=1314991316616
> java.lang.IllegalStateException
>         at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1074)
>         at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1030)
>         at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:858)
>         at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:838)
>         at org.apache.hadoop.hbase.master.handler.EnableTableHandler$BulkEnabler$1.run(EnableTableHandler.java:154)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2011-09-02 12:21:56,620 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
> {quote}
> The issue is that EnableTableHandler calls multiple BulkEnabler and it's possible that
by the time it calls it a second time, using a stale list of still-not-enabled regions, that
it tries to set one region offline in ZK but just after its state changed. Case in point:
> {quote}
> 2011-09-02 12:21:56,616 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning
region huge_ass_region_name to sv4r23s16,60020,1314880035029
> 2011-09-02 12:21:56,619 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected state
trying to OFFLINE; huge_ass_region_name state=PENDING_OPEN, ts=1314991316616
> {quote}
> Here the first line is the first assign done in the first thread, and the second line
is the second thread that got to process the same region around the same time. 3ms difference
in time. After that, the master dies, and it's pretty sad when it restarts because it failovers
an enabling table and it's ungodly slow.
> I'm pretty sure there's a window where double assignment are possible.
> Talking with Stack, it doesn't really make sense to call multiple enables since the list
of regions is static (the table is disabled!). We should just call it and wait. Also there's
a lot of cleanup to do in EnableTableHandler since it refers to disabling the table (copy
pasta I guess).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message