hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12958) SSH doing hbase:meta get but hbase:meta not assigned
Date Wed, 04 Feb 2015 00:15:34 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14304357#comment-14304357

stack commented on HBASE-12958:

I talked offline w/ mighty [~jxiang]

Two SSHs running, one for meta and one for another server that had died at around same time.
The meta SSH could not make progress because the non-meta SSH was in RegionStates with lock
held shutting out the meta SSH making progress (see above snippet from thread dump).  This
is working theory.  Unfortunately I lost the thread dumps so can't confirm for sure.  Let
me try and conjure this scenario with a bit of code.

> SSH doing hbase:meta get but hbase:meta not assigned
> ----------------------------------------------------
>                 Key: HBASE-12958
>                 URL: https://issues.apache.org/jira/browse/HBASE-12958
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>            Reporter: stack
>            Assignee: stack
> All master threads are blocked waiting on this call to return:
> {code}
> "MASTER_SERVER_OPERATIONS-c2020:16020-2" #189 prio=5 os_prio=0 tid=0x00007f4b0408b000
nid=0x7821 in Object.wait() [0x00007f4ada24d000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:168)
>         - locked <0x000000041c374f50> (a java.util.concurrent.atomic.AtomicBoolean)
>         at org.apache.hadoop.hbase.client.HTable.get(HTable.java:881)
>         at org.apache.hadoop.hbase.MetaTableAccessor.get(MetaTableAccessor.java:208)
>         at org.apache.hadoop.hbase.MetaTableAccessor.getRegionLocation(MetaTableAccessor.java:250)
>         at org.apache.hadoop.hbase.MetaTableAccessor.getRegion(MetaTableAccessor.java:225)
>         at org.apache.hadoop.hbase.master.RegionStates.serverOffline(RegionStates.java:634)
>         - locked <0x000000041c1f0d80> (a org.apache.hadoop.hbase.master.RegionStates)
>         at org.apache.hadoop.hbase.master.AssignmentManager.processServerShutdown(AssignmentManager.java:3298)
>         at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:226)
>         at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> Master is stuck trying to find hbase:meta on the server that just crashed and that we
just recovered:
> Mon Feb 02 23:00:02 PST 2015, null, java.net.SocketTimeoutException: callTimeout=60000,
callDuration=68181: row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=c2022.halxg.cloudera.com,16020,1422944918568,
> Will add more detail in a sec.

This message was sent by Atlassian JIRA

View raw message