hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4395) EnableTableHandler races with itself
Date Wed, 14 Sep 2011 18:09:09 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104735#comment-13104735
] 

Jean-Daniel Cryans commented on HBASE-4395:
-------------------------------------------

bq. I think regions.size() should increase in the above loop. So I don't understand the condition
for if above.

Yeah that was a last minute change, I actually tested with "regions.size() > lastNumberOfRegions"
and then thought that that number was going down, I was confused with regionsToAssign().

bq. Also, remaining is calculated lastly. I don't know why remaining is updated in the if
block.

Derp sorry it should be the timeout that's incremented.

> EnableTableHandler races with itself
> ------------------------------------
>
>                 Key: HBASE-4395
>                 URL: https://issues.apache.org/jira/browse/HBASE-4395
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.90.5
>
>         Attachments: HBASE-4395-0.90.patch
>
>
> Very often when we try to enable a big table we get something like:
> {quote}
> 2011-09-02 12:21:56,619 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected state
trying to OFFLINE; huge_ass_region_name state=PENDING_OPEN, ts=1314991316616
> java.lang.IllegalStateException
>         at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1074)
>         at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1030)
>         at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:858)
>         at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:838)
>         at org.apache.hadoop.hbase.master.handler.EnableTableHandler$BulkEnabler$1.run(EnableTableHandler.java:154)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2011-09-02 12:21:56,620 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
> {quote}
> The issue is that EnableTableHandler calls multiple BulkEnabler and it's possible that
by the time it calls it a second time, using a stale list of still-not-enabled regions, that
it tries to set one region offline in ZK but just after its state changed. Case in point:
> {quote}
> 2011-09-02 12:21:56,616 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning
region huge_ass_region_name to sv4r23s16,60020,1314880035029
> 2011-09-02 12:21:56,619 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected state
trying to OFFLINE; huge_ass_region_name state=PENDING_OPEN, ts=1314991316616
> {quote}
> Here the first line is the first assign done in the first thread, and the second line
is the second thread that got to process the same region around the same time. 3ms difference
in time. After that, the master dies, and it's pretty sad when it restarts because it failovers
an enabling table and it's ungodly slow.
> I'm pretty sure there's a window where double assignment are possible.
> Talking with Stack, it doesn't really make sense to call multiple enables since the list
of regions is static (the table is disabled!). We should just call it and wait. Also there's
a lot of cleanup to do in EnableTableHandler since it refers to disabling the table (copy
pasta I guess).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message