hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Marc Spaggiari (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8912) [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE
Date Fri, 03 Jan 2014 21:36:52 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861925#comment-13861925
] 

Jean-Marc Spaggiari commented on HBASE-8912:
--------------------------------------------

After the first restart, 36 regions are stuck in transition :( But not any server crashed.

What I did:
- Restored default balancer to make sure as much regions as possible will move.
- Stop/start HBase
- Run balancer from shell.

Every thing is back up after a 2nd restart.

I get many errors like this one:
{code}
2014-01-03 16:03:03,958 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received FAILED_OPEN
for region b75cb9067c3c4456d6198c9237c143b3 from server node4.domain.com,60020,1388782921790
but region was in  the state page,rf.idua.www\x1Fhttp\x1F-1\x1F/fr/brand/fr/audi_fleet_solutions/contact/contact_transport_personnes.html\x1Fnull,1379103792232.b75cb9067c3c4456d6198c9237c143b3.
state=CLOSED, ts=1388782983373, server=node4.domain.com,60020,1388782921790 and not in OFFLINE,
PENDING_OPEN or OPENING
{code}

After investigations, I figured that snappy was missing on a server. I fixed that, restart:
All seems to be fine. So I restored my customized balancer, restart, balanced.

Still some warning in the logs:
{code}
2014-01-03 16:21:52,864 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENED
for region db8e67acde26bf340da481d3c1b934cd from server node4.domain.com,60020,1388784051197
but region was in  the state page,moc.tenretnigruoboc.www\x1Fhttp\x1F-1\x1F/cobourg-and-the-web\x1Fnull,1379103844627.db8e67acde26bf340da481d3c1b934cd.
state=OPEN, ts=1388784100392, server=node4.distparser.com,60020,1388784051197 and not in expected
OFFLINE, PENDING_OPEN or OPENING states
{code}

But this time all the regions are assigned correctly.

I did that one more time (change balancer, stop, start, balance. Change balancer, stop, start,
balance). I turned loglevel to warn.

{code}
2014-01-03 16:28:51,142 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENED
for region 17bee313797fc1ce982c0e31fdb6620c from server node8.domain.com,60020,1388784498327
but region was in  the state page,rf.ofniecnarf.www\x1Fhttp\x1F-1\x1F/vote/comment/27996/1/vote/zero_vote/c99b0992e5a9cd6bf3a4cfc91769ceeb\x1Fnull,1379104524006.17bee313797fc1ce982c0e31fdb6620c.
state=OPEN, ts=1388784531048, server=node8.distparser.com,60020,1388784498327 and not in expected
OFFLINE, PENDING_OPEN or OPENING states
2014-01-03 16:28:52,135 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENED
for region 6dc6290df1855b319f60bf89faa3da41 from server node8.domain.com,60020,1388784498327
but region was in  the state page_crc,\x00\x00\x00\x00\xD7\xD9\x97\x8Bvideo.k-wreview.ca,1378042601904.6dc6290df1855b319f60bf89faa3da41.
state=OPEN, ts=1388784531793, server=node8.distparser.com,60020,1388784498327 and not in expected
OFFLINE, PENDING_OPEN or OPENING states
2014-01-03 16:28:52,712 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENED
for region ec4f96b6cedd935aeba279b15d5337af from server node8.domain.com,60020,1388784498327
but region was in  the state work_proposed,\x98\xBF\xAF\x90\x00\x00\x00\x00http://feedproxy.google.com/~r/WheatWeeds/~3/Of24fZKcpco/the-eighth-day-of-christmas.html,1378975430143.ec4f96b6cedd935aeba279b15d5337af.
state=OPEN, ts=1388784532540, server=node8.distparser.com,60020,1388784498327 and not in expected
OFFLINE, PENDING_OPEN or OPENING states
2014-01-03 16:28:52,747 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENED
for region 4f823b5de664556a89cbd86aa41cd0b0 from server node8.distparser.com,60020,1388784498327
but region was in  the state work_proposed,\x8D4K\xEA\x00\x00\x00\x00http://twitter.com/home?status=CartoonStock%3A++http%3A%2F%2Fwww%2Ecartoonstock%2Ecom%2Fdirectory%2Fc%2Fcream%5Ftea%5Fgifts%2Easp,1378681682935.4f823b5de664556a89cbd86aa41cd0b0.
state=OPEN, ts=1388784532552, server=node8.distparser.com,60020,1388784498327 and not in expected
OFFLINE, PENDING_OPEN or OPENING states
2014-01-03 16:28:53,244 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENED
for region da0bd0a6b7187f731fb34d4ac14ca279 from server node8.domain.com,60020,1388784498327
but region was in  the state work_proposed,\xB2\xE6\xB6\xBB\x00\x00\x00\x00http://www.canpages.ca/page/QC/notre-dame-des-prairies/concept-beton-design/4550984.html,1378737981443.da0bd0a6b7187f731fb34d4ac14ca279.
state=OPEN, ts=1388784533203, server=node8.distparser.com,60020,1388784498327 and not in expected
OFFLINE, PENDING_OPEN or OPENING states
{code}

But everything finally got assigned without any restart required, any pretty quickly.

Logs from the last run:
{code}
2014-01-03 16:32:20,252 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {"processingtimems":10969,"call":"balance(),
rpc version=1, client version=29, methodsFingerPrint=1886733559","client":"192.168.23.7:54614","starttimems":1388784729247,"queuetimems":0,"class":"HMaster","responsesize":0,"method":"balance"}
2014-01-03 16:32:21,278 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENED
for region 043d45cada6185d86e743754957e579a from server node1.distparser.com,60020,1388784692832
but region was in  the state page,moc.yubreffotseb.www\x1Fhttp\x1F-1\x1F/camera-bags-cases-straps-camera-bags-cases-c-282_888_580.html\x1Fzenid=ji3nr2ps8rnbaa7joc0lv4qln2,1388782516646.043d45cada6185d86e743754957e579a.
state=OPEN, ts=1388784735731, server=node1.distparser.com,60020,1388784692832 and not in expected
OFFLINE, PENDING_OPEN or OPENING states
2014-01-03 16:32:21,713 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENED
for region e097f829b6eafb70b30c254fd4af662c from server node1.distparser.com,60020,1388784692832
but region was in  the state page,ac.usneeuq.ssenisub\x1Fhttp\x1F-1\x1F/grad_studies/PHD/about_us/queens_leaders_forum/about_us/about_us/grad_studies/PHD/student_career_services/queens_leaders_forum/recruiting/news/recruiting/about_us/accreditations.php\x1Fnull,1383168138496.e097f829b6eafb70b30c254fd4af662c.
state=OPEN, ts=1388784736528, server=node1.distparser.com,60020,1388784692832 and not in expected
OFFLINE, PENDING_OPEN or OPENING states
2014-01-03 16:32:26,862 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENED
for region c94705f4a7c23a0a05a01bfe9d7755bc from server node1.distparser.com,60020,1388784692832
but region was in  the state entry,christian_labelle,1377000858428.c94705f4a7c23a0a05a01bfe9d7755bc.
state=OPEN, ts=1388784740724, server=node1.distparser.com,60020,1388784692832 and not in expected
OFFLINE, PENDING_OPEN or OPENING states
2014-01-03 16:32:34,516 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENED
for region a8d60db86bd03cfbfba0bae0bb3cb564 from server node1.distparser.com,60020,1388784692832
but region was in  the state work_proposed,W{8\xB2\x00\x00\x00\x00http://www.prairiesouth.ca/williamgrayson/calendar-mainmenu-26/day.listevents/2013/10/29/23.html,1383415634227.a8d60db86bd03cfbfba0bae0bb3cb564.
state=OPEN, ts=1388784754264, server=node1.distparser.com,60020,1388784692832 and not in expected
OFFLINE, PENDING_OPEN or OPENING states
{code}

So overall, it's WAY more stable! I have not been able to get anything stuck or crashed with
the 2 patchs applied. I will keep them ;) Big +1 from me. Thanks for fixing that. I think
it might be easy for fix the last few remaining warnings...

> [0.94] AssignmentManager throws IllegalStateException from PENDING_OPEN to OFFLINE
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-8912
>                 URL: https://issues.apache.org/jira/browse/HBASE-8912
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Assignee: Lars Hofhansl
>            Priority: Critical
>             Fix For: 0.94.16
>
>         Attachments: 8912-0.94-alt2.txt, 8912-0.94.txt, 8912-fix-race.txt, HBASE-8912.patch,
HBase-0.94 #1036 test - testRetrying [Jenkins].html, log.txt, org.apache.hadoop.hbase.catalog.TestMetaReaderEditor-output.txt
>
>
> AM throws this exception which subsequently causes the master to abort: 
> {code}
> java.lang.IllegalStateException: Unexpected state : testRetrying,jjj,1372891751115.9b828792311001062a5ff4b1038fe33b.
state=PENDING_OPEN, ts=1372891751912, server=hemera.apache.org,39064,1372891746132 .. Cannot
transit it to OFFLINE.
> 	at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879)
> 	at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688)
> 	at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424)
> 	at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399)
> 	at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394)
> 	at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105)
> 	at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}
> This exception trace is from the failing test TestMetaReaderEditor which is failing pretty
frequently, but looking at the test code, I think this is not a test-only issue, but affects
the main code path. 
> https://builds.apache.org/job/HBase-0.94/1036/testReport/junit/org.apache.hadoop.hbase.catalog/TestMetaReaderEditor/testRetrying/



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message