hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12958) SSH doing hbase:meta get but hbase:meta not assigned
Date Tue, 03 Feb 2015 20:21:35 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303902#comment-14303902
] 

stack commented on HBASE-12958:
-------------------------------

Odd. We go from acknowledging that the host with meta is down (c2022) to of a sudden forgetting
about it:

{code}
2015-02-02 22:32:15,574 INFO  [main-EventThread] zookeeper.RegionServerTracker: RegionServer
ephemeral node deleted, processing expiration [c2022.halxg.cloudera.com,16020,1422944918568]
2015-02-02 22:32:15,575 DEBUG [main-EventThread] master.AssignmentManager: based on AM, current
region=hbase:meta,,1.1588230740 is on server=c2022.halxg.cloudera.com,16020,1422944918568
server being checked: c2022.halxg.cloudera.com,16020,1422944918568
2015-02-02 22:32:15,575 DEBUG [main-EventThread] master.ServerManager: Added=c2022.halxg.cloudera.com,16020,1422944918568
to dead servers, submitted shutdown handler to be executed meta=true
2015-02-02 22:32:15,576 INFO  [MASTER_META_SERVER_OPERATIONS-c2020:16020-1] handler.MetaServerShutdownHandler:
Splitting hbase:meta logs for c2022.halxg.cloudera.com,16020,1422944918568
2015-02-02 22:32:15,577 DEBUG [main-EventThread] zookeeper.RegionServerTracker: Added tracking
of RS /hbase/rs/c2023.halxg.cloudera.com,16020,1422945128068
2015-02-02 22:32:15,578 DEBUG [main-EventThread] zookeeper.RegionServerTracker: Added tracking
of RS /hbase/rs/c2025.halxg.cloudera.com,16020,1422935795768
2015-02-02 22:32:15,578 DEBUG [main-EventThread] zookeeper.RegionServerTracker: Added tracking
of RS /hbase/rs/c2024.halxg.cloudera.com,16020,1422944894206
2015-02-02 22:32:15,585 DEBUG [MASTER_META_SERVER_OPERATIONS-c2020:16020-1] master.MasterFileSystem:
Renamed region directory: hdfs://c2020.halxg.cloudera.com:8020/hbase/WALs/c2022.halxg.cloudera.com,16020,1422944918568-splitting
2015-02-02 22:32:15,585 INFO  [MASTER_META_SERVER_OPERATIONS-c2020:16020-1] master.SplitLogManager:
dead splitlog workers [c2022.halxg.cloudera.com,16020,1422944918568]
2015-02-02 22:32:15,587 DEBUG [MASTER_META_SERVER_OPERATIONS-c2020:16020-1] master.SplitLogManager:
Scheduling batch of logs to split
2015-02-02 22:32:15,587 INFO  [MASTER_META_SERVER_OPERATIONS-c2020:16020-1] master.SplitLogManager:
started splitting 1 logs in [hdfs://c2020.halxg.cloudera.com:8020/hbase/WALs/c2022.halxg.cloudera.com,16020,1422944918568-splitting]
for [c2022.halxg.cloudera.com,16020,1422944918568]
2015-02-02 22:32:15,591 DEBUG [main-EventThread] coordination.SplitLogManagerCoordination:
put up splitlog task at znode /hbase/splitWAL/WALs%2Fc2022.halxg.cloudera.com%2C16020%2C1422944918568-splitting%2Fc2022.halxg.cloudera.com%252C16020%252C1422944918568..meta.1422945128892.meta
2015-02-02 22:32:15,591 DEBUG [main-EventThread] coordination.SplitLogManagerCoordination:
task not yet acquired /hbase/splitWAL/WALs%2Fc2022.halxg.cloudera.com%2C16020%2C1422944918568-splitting%2Fc2022.halxg.cloudera.com%252C16020%252C1422944918568..meta.1422945128892.meta
ver = 0
2015-02-02 22:32:15,607 INFO  [main-EventThread] coordination.SplitLogManagerCoordination:
task /hbase/splitWAL/WALs%2Fc2022.halxg.cloudera.com%2C16020%2C1422944918568-splitting%2Fc2022.halxg.cloudera.com%252C16020%252C1422944918568..meta.1422945128892.meta
acquired by c2025.halxg.cloudera.com,16020,1422935795768
2015-02-02 22:32:15,929 INFO  [c2020.halxg.cloudera.com,16020,1422944946802.splitLogManagerTimeoutMonitor]
coordination.SplitLogManagerCoordination: resubmitting task /hbase/splitWAL/WALs%2Fc2021.halxg.cloudera.com%2C16020%2C1422944889403-splitting%2Fc2021.halxg.cloudera.com%252C16020%252C1422944889403.default.1422945068674
2015-02-02 22:32:15,941 INFO  [c2020.halxg.cloudera.com,16020,1422944946802.splitLogManagerTimeoutMonitor]
master.SplitLogManager: resubmitted 1 out of 3 tasks
2015-02-02 22:32:15,941 DEBUG [main-EventThread] coordination.SplitLogManagerCoordination:
task not yet acquired /hbase/splitWAL/WALs%2Fc2021.halxg.cloudera.com%2C16020%2C1422944889403-splitting%2Fc2021.halxg.cloudera.com%252C16020%252C1422944889403.default.1422945068674
ver = 3
2015-02-02 22:32:15,949 INFO  [main-EventThread] coordination.SplitLogManagerCoordination:
task /hbase/splitWAL/RESCAN0000004442 entered state: DONE c2020.halxg.cloudera.com,16020,1422944946802
2015-02-02 22:32:15,957 DEBUG [main-EventThread] coordination.ZKSplitLogManagerCoordination$DeleteAsyncCallback:
deleted /hbase/splitWAL/RESCAN0000004442
2015-02-02 22:32:15,957 DEBUG [main-EventThread] coordination.SplitLogManagerCoordination:
deleted task without in memory state /hbase/splitWAL/RESCAN0000004442
2015-02-02 22:32:16,007 INFO  [main-EventThread] coordination.SplitLogManagerCoordination:
task /hbase/splitWAL/WALs%2Fc2021.halxg.cloudera.com%2C16020%2C1422944889403-splitting%2Fc2021.halxg.cloudera.com%252C16020%252C1422944889403.default.1422945068674
acquired by c2024.halxg.cloudera.com,16020,1422944894206
2015-02-02 22:32:16,208 INFO  [main-EventThread] coordination.SplitLogManagerCoordination:
task /hbase/splitWAL/WALs%2Fc2021.halxg.cloudera.com%2C16020%2C1422944889403-splitting%2Fc2021.halxg.cloudera.com%252C16020%252C1422944889403.default.1422945090502
entered state: DONE c2023.halxg.cloudera.com,16020,1422945128068
2015-02-02 22:32:16,219 INFO  [main-EventThread] wal.WALSplitter: Archived processed log hdfs://c2020.halxg.cloudera.com:8020/hbase/WALs/c2021.halxg.cloudera.com,16020,1422944889403-splitting/c2021.halxg.cloudera.com%2C16020%2C1422944889403.default.1422945090502
to hdfs://c2020.halxg.cloudera.com:8020/hbase/oldWALs/c2021.halxg.cloudera.com%2C16020%2C1422944889403.default.1422945090502
2015-02-02 22:32:16,220 INFO  [main-EventThread] coordination.SplitLogManagerCoordination:
Done splitting /hbase/splitWAL/WALs%2Fc2021.halxg.cloudera.com%2C16020%2C1422944889403-splitting%2Fc2021.halxg.cloudera.com%252C16020%252C1422944889403.default.1422945090502
2015-02-02 22:32:16,224 DEBUG [main-EventThread] coordination.ZKSplitLogManagerCoordination$DeleteAsyncCallback:
deleted /hbase/splitWAL/WALs%2Fc2021.halxg.cloudera.com%2C16020%2C1422944889403-splitting%2Fc2021.halxg.cloudera.com%252C16020%252C1422944889403.default.1422945090502
2015-02-02 22:32:17,225 INFO  [main-EventThread] coordination.SplitLogManagerCoordination:
task /hbase/splitWAL/WALs%2Fc2021.halxg.cloudera.com%2C16020%2C1422944889403-splitting%2Fc2021.halxg.cloudera.com%252C16020%252C1422944889403.default.1422945068674
entered state: DONE c2024.halxg.cloudera.com,16020,1422944894206
2015-02-02 22:32:17,235 INFO  [main-EventThread] wal.WALSplitter: Archived processed log hdfs://c2020.halxg.cloudera.com:8020/hbase/WALs/c2021.halxg.cloudera.com,16020,1422944889403-splitting/c2021.halxg.cloudera.com%2C16020%2C1422944889403.default.1422945068674
to hdfs://c2020.halxg.cloudera.com:8020/hbase/oldWALs/c2021.halxg.cloudera.com%2C16020%2C1422944889403.default.1422945068674
2015-02-02 22:32:17,236 INFO  [main-EventThread] coordination.SplitLogManagerCoordination:
Done splitting /hbase/splitWAL/WALs%2Fc2021.halxg.cloudera.com%2C16020%2C1422944889403-splitting%2Fc2021.halxg.cloudera.com%252C16020%252C1422944889403.default.1422945068674
2015-02-02 22:32:17,241 DEBUG [main-EventThread] coordination.ZKSplitLogManagerCoordination$DeleteAsyncCallback:
deleted /hbase/splitWAL/WALs%2Fc2021.halxg.cloudera.com%2C16020%2C1422944889403-splitting%2Fc2021.halxg.cloudera.com%252C16020%252C1422944889403.default.1422945068674
2015-02-02 22:32:17,244 INFO  [MASTER_SERVER_OPERATIONS-c2020:16020-2] master.SplitLogManager:
finished splitting (more than or equal to) 1955009811 bytes in 16 log files in [hdfs://c2020.halxg.cloudera.com:8020/hbase/WALs/c2021.halxg.cloudera.com,16020,1422944889403-splitting]
in 7355ms
2015-02-02 22:32:17,244 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16020-2] master.RegionStates:
Adding to processed servers c2021.halxg.cloudera.com,16020,1422944889403
2015-02-02 22:32:17,244 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16020-2] master.RegionStates:
Offline splitting/merging region {7e0c50a2296895f66bee85df0eaba533 state=SPLITTING, ts=1422945115260,
server=c2021.halxg.cloudera.com,16020,1422944889403}
2015-02-02 22:32:17,553 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16020-2] ipc.AbstractRpcClient:
Use SIMPLE authentication for service ClientService, sasl=false
2015-02-02 22:32:17,553 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16020-2] ipc.AbstractRpcClient:
Connecting to c2022.halxg.cloudera.com/10.20.84.28:16020
2015-02-02 22:32:17,554 INFO  [MASTER_SERVER_OPERATIONS-c2020:16020-2] client.RpcRetryingCaller:
Call exception, tries=1, retries=350, started=305 ms ago, cancelled=false, msg=row '54d2e8d62ae8d9cae3abc33f1b58e32c'
on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=c2022.halxg.cloudera.com,16020,1422944918568,
seqNum=0
2015-02-02 22:32:18,055 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16020-2] ipc.AbstractRpcClient:
Use SIMPLE authentication for service ClientService, sasl=false
2015-02-02 22:32:18,055 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16020-2] ipc.AbstractRpcClient:
Not trying to connect to c2022.halxg.cloudera.com/10.20.84.28:16020 this server is in the
failed servers list
2015-02-02 22:32:18,056 INFO  [MASTER_SERVER_OPERATIONS-c2020:16020-2] client.RpcRetryingCaller:
Call exception, tries=2, retries=350, started=807 ms ago, cancelled=false, msg=row '54d2e8d62ae8d9cae3abc33f1b58e32c'
on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=c2022.halxg.cloudera.com,16020,1422944918568,
seqNum=0
2015-02-02 22:32:18,930 INFO  [c2020.halxg.cloudera.com,16020,1422944946802.splitLogManagerTimeoutMonitor]
master.SplitLogManager: total tasks = 1 unassigned = 0 tasks={/hbase/splitWAL/WALs%2Fc2022.halxg.cloudera.com%2C16020%2C1422944918568-splitting%2Fc2022.halxg.cloudera.com%252C16020%252C1422944918568..meta.1422945128892.meta=last_update
= 1422945135625 last_version = 2 cur_worker_name = c2025.halxg.cloudera.com,16020,1422935795768
status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error =
0}
2015-02-02 22:32:19,059 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16020-2] ipc.AbstractRpcClient:
Use SIMPLE authentication for service ClientService, sasl=false
2015-02-02 22:32:19,059 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16020-2] ipc.AbstractRpcClient:
Not trying to connect to c2022.halxg.cloudera.com/10.20.84.28:16020 this server is in the
failed servers list
2015-02-02 22:32:19,059 INFO  [MASTER_SERVER_OPERATIONS-c2020:16020-2] client.RpcRetryingCaller:
Call exception, tries=3, retries=350, started=1810 ms ago, cancelled=false, msg=row '54d2e8d62ae8d9cae3abc33f1b58e32c'
on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=c2022.halxg.cloudera.com,16020,1422944918568,
seqNum=0
2015-02-02 22:32:20,258 INFO  [main-EventThread] coordination.SplitLogManagerCoordination:
task /hbase/splitWAL/WALs%2Fc2022.halxg.cloudera.com%2C16020%2C1422944918568-splitting%2Fc2022.halxg.cloudera.com%252C16020%252C1422944918568..meta.1422945128892.meta
entered state: DONE c2025.halxg.cloudera.com,16020,1422935795768
2015-02-02 22:32:20,269 INFO  [main-EventThread] wal.WALSplitter: Archived processed log hdfs://c2020.halxg.cloudera.com:8020/hbase/WALs/c2022.halxg.cloudera.com,16020,1422944918568-splitting/c2022.halxg.cloudera.com%2C16020%2C1422944918568..meta.1422945128892.meta
to hdfs://c2020.halxg.cloudera.com:8020/hbase/oldWALs/c2022.halxg.cloudera.com%2C16020%2C1422944918568..meta.1422945128892.meta
2015-02-02 22:32:20,269 INFO  [main-EventThread] coordination.SplitLogManagerCoordination:
Done splitting /hbase/splitWAL/WALs%2Fc2022.halxg.cloudera.com%2C16020%2C1422944918568-splitting%2Fc2022.halxg.cloudera.com%252C16020%252C1422944918568..meta.1422945128892.meta
2015-02-02 22:32:20,273 WARN  [MASTER_META_SERVER_OPERATIONS-c2020:16020-1] master.SplitLogManager:
returning success without actually splitting and deleting all the log files in path hdfs://c2020.halxg.cloudera.com:8020/hbase/WALs/c2022.halxg.cloudera.com,16020,1422944918568-splitting
2015-02-02 22:32:20,273 INFO  [MASTER_META_SERVER_OPERATIONS-c2020:16020-1] master.SplitLogManager:
finished splitting (more than or equal to) 83 bytes in 1 log files in [hdfs://c2020.halxg.cloudera.com:8020/hbase/WALs/c2022.halxg.cloudera.com,16020,1422944918568-splitting]
in 4686ms
2015-02-02 22:32:20,274 DEBUG [main-EventThread] coordination.ZKSplitLogManagerCoordination$DeleteAsyncCallback:
deleted /hbase/splitWAL/WALs%2Fc2022.halxg.cloudera.com%2C16020%2C1422944918568-splitting%2Fc2022.halxg.cloudera.com%252C16020%252C1422944918568..meta.1422945128892.meta
2015-02-02 22:32:21,051 DEBUG [main-EventThread] zookeeper.RegionServerTracker: Added tracking
of RS /hbase/rs/c2021.halxg.cloudera.com,16020,1422945139174
2015-02-02 22:32:21,052 DEBUG [main-EventThread] zookeeper.RegionServerTracker: Added tracking
of RS /hbase/rs/c2023.halxg.cloudera.com,16020,1422945128068
2015-02-02 22:32:21,052 DEBUG [main-EventThread] zookeeper.RegionServerTracker: Added tracking
of RS /hbase/rs/c2025.halxg.cloudera.com,16020,1422935795768
2015-02-02 22:32:21,052 DEBUG [main-EventThread] zookeeper.RegionServerTracker: Added tracking
of RS /hbase/rs/c2024.halxg.cloudera.com,16020,1422944894206
2015-02-02 22:32:21,071 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16020-2] ipc.AbstractRpcClient:
Use SIMPLE authentication for service ClientService, sasl=false
2015-02-02 22:32:21,071 DEBUG [MASTER_SERVER_OPERATIONS-c2020:16020-2] ipc.AbstractRpcClient:
Connecting to c2022.halxg.cloudera.com/10.20.84.28:16020
2015-02-02 22:32:21,072 INFO  [MASTER_SERVER_OPERATIONS-c2020:16020-2] client.RpcRetryingCaller:
Call exception, tries=4, retries=350, started=3823 ms ago, cancelled=false, msg=row '54d2e8d62ae8d9cae3abc33f1b58e32c'
on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=c2022.halxg.cloudera.com,16020,1422944918568,
seqNum=0
2015-02-02 22:32:21,111 DEBUG [RpcServer.listener,port=16020] ipc.RpcServer: RpcServer.listener,port=16020:
connection from 10.20.84.27:35695; # active connections: 6
{code}

[~jxiang] Any input boss?  Why would we split the meta log and then not assign hbase:meta?
Thanks sir.

> SSH doing hbase:meta get but hbase:meta not assigned
> ----------------------------------------------------
>
>                 Key: HBASE-12958
>                 URL: https://issues.apache.org/jira/browse/HBASE-12958
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>            Reporter: stack
>            Assignee: stack
>
> All master threads are blocked waiting on this call to return:
> {code}
> "MASTER_SERVER_OPERATIONS-c2020:16020-2" #189 prio=5 os_prio=0 tid=0x00007f4b0408b000
nid=0x7821 in Object.wait() [0x00007f4ada24d000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:168)
>         - locked <0x000000041c374f50> (a java.util.concurrent.atomic.AtomicBoolean)
>         at org.apache.hadoop.hbase.client.HTable.get(HTable.java:881)
>         at org.apache.hadoop.hbase.MetaTableAccessor.get(MetaTableAccessor.java:208)
>         at org.apache.hadoop.hbase.MetaTableAccessor.getRegionLocation(MetaTableAccessor.java:250)
>         at org.apache.hadoop.hbase.MetaTableAccessor.getRegion(MetaTableAccessor.java:225)
>         at org.apache.hadoop.hbase.master.RegionStates.serverOffline(RegionStates.java:634)
>         - locked <0x000000041c1f0d80> (a org.apache.hadoop.hbase.master.RegionStates)
>         at org.apache.hadoop.hbase.master.AssignmentManager.processServerShutdown(AssignmentManager.java:3298)
>         at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:226)
>         at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> Master is stuck trying to find hbase:meta on the server that just crashed and that we
just recovered:
> Mon Feb 02 23:00:02 PST 2015, null, java.net.SocketTimeoutException: callTimeout=60000,
callDuration=68181: row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=c2022.halxg.cloudera.com,16020,1422944918568,
seqNum=0
> Will add more detail in a sec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message