hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Moved] (HBASE-12255) hbase table is unavailable after restart hdfs without stop hbase
Date Tue, 14 Oct 2014 15:09:35 GMT

     [ https://issues.apache.org/jira/browse/HBASE-12255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrew Purtell moved PHOENIX-1348 to HBASE-12255:
-------------------------------------------------

    Affects Version/s:     (was: 4.1)
             Workflow: no-reopen-closed, patch-avail  (was: patch-available, re-open possible)
                  Key: HBASE-12255  (was: PHOENIX-1348)
              Project: HBase  (was: Phoenix)

> hbase table is unavailable after restart hdfs without stop hbase
> ----------------------------------------------------------------
>
>                 Key: HBASE-12255
>                 URL: https://issues.apache.org/jira/browse/HBASE-12255
>             Project: HBase
>          Issue Type: Bug
>         Environment: hadoop-2.5.1
> hbase-0.98
> phonenix-4.1.0
>            Reporter: Junyong Li
>
> I have a phoenix table 'EVENT', and the table have a index 'IDX_DATE_HOUR_X'.
> I restarted hdfs without stop hbase, after then, the hbase table cann't be scaned.
> I try to restart hbase, the all hbase table still cann't be scaned.
> the regionserver log have many exception like this:
> 2014-10-13 19:33:05,287 INFO  [A01101303447-V1,60020,1413199890407-recovery-writer--pool4-t3]
client.AsyncProcess: #4, waiting for some tasks to finish. Expected max=0, tasksSent=9, tasksDone=8,
currentTasksDone=8, retries=8 hasError=fal
> se, tableName=IDX_DATE_HOUR_X
> 2014-10-13 19:33:05,298 INFO  [A01101303447-V1,60020,1413199890407-recovery-writer--pool4-t2]
client.AsyncProcess: #5, waiting for some tasks to finish. Expected max=0, tasksSent=9, tasksDone=8,
currentTasksDone=8, retries=8 hasError=fal
> se, tableName=IDX_DATE_HOUR_X
> 2014-10-13 19:33:05,311 INFO  [A01101303447-V1,60020,1413199890407-recovery-writer--pool4-t1]
client.AsyncProcess: #6, waiting for some tasks to finish. Expected max=0, tasksSent=9, tasksDone=8,
currentTasksDone=8, retries=8 hasError=fal
> se, tableName=IDX_DATE_HOUR_X
> 2014-10-13 19:33:06,452 INFO  [ReplicationExecutor-0] replication.ReplicationQueuesZKImpl:
Moving A01101303447-V1,60020,1413199414409's hlogs to my queue
> 2014-10-13 19:33:15,325 INFO  [A01101303447-V1,60020,1413199890407-recovery-writer--pool4-t1]
client.AsyncProcess: #6, waiting for some tasks to finish. Expected max=0, tasksSent=10, tasksDone=9,
currentTasksDone=9, retries=9 hasError=fa
> lse, tableName=IDX_DATE_HOUR_X
> 2014-10-13 19:33:15,333 INFO  [htable-pool6-t2] client.AsyncProcess: #6, table=IDX_DATE_HOUR_X,
attempt=10/350 failed 12 ops, last exception: org.apache.hadoop.hbase.exceptions.RegionOpeningException:
org.apache.hadoop.hbase.exceptions.R
> egionOpeningException: Region IDX_DATE_HOUR_X,t\x00\x00\x00\x00\x00,1413186874829.9a92abb84768b129df3faedb877f7bea.
is opening on A01101303447-V1,60020,1413199890407
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2759)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4213)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3437)
>         at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29593)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
>         at java.lang.Thread.run(Thread.java:744)
> -------------------------------
> After two days's try, i found that:
> If idisable 'EVENT', other tables can be scanned, then i enable 'EVENT' manually, the
region log show that NullPointExceptin has occur then replaying WAL, the following is log:
> 2014-10-13 19:25:21,043 INFO  [RS_OPEN_REGION-A01101303447-V1:60020-1] regionserver.HRegion:
Replaying edits from hdfs://localhost/hbase-0.98/data/default/EVENT/def4a581d4ad963cbb8cad32cbfbab2e/recovered.edits/0000000000000000002
> 2014-10-13 19:25:21,048 INFO  [A01101303447-V1,60020,1413199414409-recovery-writer--pool17-t2-SendThread(localhost:2182)]
zookeeper.ClientCnxn: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2182.
Will not attempt to authe
> nticate using SASL (unknown error)
> 2014-10-13 19:25:21,049 INFO  [A01101303447-V1,60020,1413199414409-recovery-writer--pool17-t2-SendThread(localhost:2182)]
zookeeper.ClientCnxn: Socket connection established to localhost/0:0:0:0:0:0:0:1:2182, initiating
session
> 2014-10-13 19:25:21,051 INFO  [A01101303447-V1,60020,1413199414409-recovery-writer--pool17-t3]
zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x199c8484 connecting to ZooKeeper
ensemble=localhost:2182
> 2014-10-13 19:25:21,051 INFO  [A01101303447-V1,60020,1413199414409-recovery-writer--pool17-t3]
zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2182 sessionTimeout=90000
watcher=hconnection-0x199c8484, quorum=lo
> calhost:2182, baseZNode=/hbase
> 2014-10-13 19:25:21,052 INFO  [A01101303447-V1,60020,1413199414409-recovery-writer--pool17-t3-SendThread(localhost:2182)]
zookeeper.ClientCnxn: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2182.
Will not attempt to authe
> nticate using SASL (unknown error)
> 2014-10-13 19:25:21,052 INFO  [A01101303447-V1,60020,1413199414409-recovery-writer--pool17-t3-SendThread(localhost:2182)]
zookeeper.ClientCnxn: Socket connection established to localhost/0:0:0:0:0:0:0:1:2182, initiating
session
> 2014-10-13 19:25:21,053 INFO  [A01101303447-V1,60020,1413199414409-recovery-writer--pool17-t2-SendThread(localhost:2182)]
zookeeper.ClientCnxn: Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2182,
sessionid = 0x149093
> fc1c80203, negotiated timeout = 90000
> 2014-10-13 19:25:21,055 INFO  [A01101303447-V1,60020,1413199414409-recovery-writer--pool17-t3-SendThread(localhost:2182)]
zookeeper.ClientCnxn: Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2182,
sessionid = 0x149093
> fc1c80204, negotiated timeout = 90000
> 2014-10-13 19:25:21,103 ERROR [RS_OPEN_REGION-A01101303447-V1:60020-1] handler.OpenRegionHandler:
Failed open of region=EVENT,\x18,1413186838852.def4a581d4ad963cbb8cad32cbfbab2e., starting
to roll back the global memstore size.
> java.lang.NullPointerException
> 2014-10-13 19:25:21,104 INFO  [RS_OPEN_REGION-A01101303447-V1:60020-1] handler.OpenRegionHandler:
Opening of region {ENCODED => def4a581d4ad963cbb8cad32cbfbab2e, NAME => 'EVENT,\x18,1413186838852.def4a581d4ad963cbb8cad32cbfbab2e.',
START
> KEY => '\x18', ENDKEY => '\x19'} failed, transitioning from OPENING to FAILED_OPEN
in ZK, expecting version 28
> 2014-10-13 19:25:21,104 DEBUG [RS_OPEN_REGION-A01101303447-V1:60020-1] zookeeper.ZKAssign:
regionserver:60020-0x149093fc1c80001, quorum=localhost:2182, baseZNode=/hbase Transitioning
def4a581d4ad963cbb8cad32cbfbab2e from RS_ZK_REGION_OPE
> NING to RS_ZK_REGION_FAILED_OPEN
> 2014-10-13 19:25:21,107 DEBUG [RS_OPEN_REGION-A01101303447-V1:60020-1] zookeeper.ZKAssign:
regionserver:60020-0x149093fc1c80001, quorum=localhost:2182, baseZNode=/hbase Transitioned
node def4a581d4ad963cbb8cad32cbfbab2e from RS_ZK_REGION
> _OPENING to RS_ZK_REGION_FAILED_OPEN
> 2014-10-13 19:25:21,108 ERROR [RS_OPEN_REGION-A01101303447-V1:60020-2] handler.OpenRegionHandler:
Failed open of region=EVENT,\x15,1413186838852.4592fbc9db5e6eb05c812dcd81f5fa4d., starting
to roll back the global memstore size.
> java.lang.NullPointerException
> 2014-10-13 19:25:21,108 INFO  [RS_OPEN_REGION-A01101303447-V1:60020-2] handler.OpenRegionHandler:
Opening of region {ENCODED => 4592fbc9db5e6eb05c812dcd81f5fa4d, NAME => 'EVENT,\x15,1413186838852.4592fbc9db5e6eb05c812dcd81f5fa4d.',
START
> KEY => '\x15', ENDKEY => '\x16'} failed, transitioning from OPENING to FAILED_OPEN
in ZK, expecting version 28
> 2014-10-13 19:25:21,108 DEBUG [RS_OPEN_REGION-A01101303447-V1:60020-2] zookeeper.ZKAssign:
regionserver:60020-0x149093fc1c80001, quorum=localhost:2182, baseZNode=/hbase Transitioning
4592fbc9db5e6eb05c812dcd81f5fa4d from RS_ZK_REGION_OPE
> NING to RS_ZK_REGION_FAILED_OPEN
> 2014-10-13 19:25:21,111 DEBUG [RS_OPEN_REGION-A01101303447-V1:60020-2] zookeeper.ZKAssign:
regionserver:60020-0x149093fc1c80001, quorum=localhost:2182, baseZNode=/hbase Transitioned
node 4592fbc9db5e6eb05c812dcd81f5fa4d from RS_ZK_REGION
> _OPENING to RS_ZK_REGION_FAILED_OPEN
> 2014-10-13 19:25:21,113 ERROR [RS_OPEN_REGION-A01101303447-V1:60020-0] handler.OpenRegionHandler:
Failed open of region=EVENT,\x1B,1413186838852.b45c35d23b23fda5643ec5d79083488e., starting
to roll back the global memstore size.
> java.lang.NullPointerException
>  After deleteing the 'recovered.edits' in the 'EVENT' table's region, the hbase table
can be scanned. so i think the reason maybe that:
> after restart hbase, the regionserver begin opening the region and replaying WAL, when
replaying the EVENT's WAL, because EVENT has a index table IDX_DATE_HOUR_X, so the replay
process should operate IDX_DATE_HOUR_X, but at this moment, the IDX_DATE_HOUR_X table's region
is in OPENING stats, it is't unavaiabled, so the EVENT replaying process Time and time again
to retry and throw Exception aging and again.
> if I disable EVENT first, others hbase table recory successfully, but when i enable EVENT,
the region log occur java.lang.NullPointerException, this make the region transition to OPEN_FAILED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message