hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8912) [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE
Date Tue, 31 Dec 2013 21:44:51 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13859724#comment-13859724
] 

Lars Hofhansl commented on HBASE-8912:
--------------------------------------

There layers of races here it seems.
With my "fix" not to abort the master a RegionServer can now get a new request to open a region
while is it attempting to transition the current attempt in FAILE_OPEN in ZK. It then ignores
that concurrent open request, and thus the region will forever be stuck in PENDING_OPEN (unless
the timeout manager kicks in of course). The only reason that did not happen before was that
the master would just abort before it can reassign the region a second time.

When I fix that (by taking the region out of RITs before the znode is transitioned) there
is an NPE when trying to remove the HRI from the RITs (which was hidden by missing the concurrent
assign request before).

Terrible. I hope this is better in trunk after the AM rewrite. But since this issue stems
from backported code, I doubt it.


> [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-8912
>                 URL: https://issues.apache.org/jira/browse/HBASE-8912
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Priority: Critical
>             Fix For: 0.94.16
>
>         Attachments: 8912-0.94-alt2.txt, 8912-0.94.txt, HBase-0.94 #1036 test - testRetrying
[Jenkins].html, log.txt, org.apache.hadoop.hbase.catalog.TestMetaReaderEditor-output.txt
>
>
> AM throws this exception which subsequently causes the master to abort: 
> {code}
> java.lang.IllegalStateException: Unexpected state : testRetrying,jjj,1372891751115.9b828792311001062a5ff4b1038fe33b.
state=PENDING_OPEN, ts=1372891751912, server=hemera.apache.org,39064,1372891746132 .. Cannot
transit it to OFFLINE.
> 	at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879)
> 	at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688)
> 	at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424)
> 	at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399)
> 	at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394)
> 	at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105)
> 	at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}
> This exception trace is from the failing test TestMetaReaderEditor which is failing pretty
frequently, but looking at the test code, I think this is not a test-only issue, but affects
the main code path. 
> https://builds.apache.org/job/HBase-0.94/1036/testReport/junit/org.apache.hadoop.hbase.catalog/TestMetaReaderEditor/testRetrying/



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message