hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Serega Sheypak <serega.shey...@gmail.com>
Subject Re: Periodical on org.apache.hadoop.hbase.NotServingRegionException HBase 1.0.0-cdh5.4.4
Date Sat, 07 May 2016 19:43:40 GMT
Hi Chien, here is pastebin with log: http://pastebin.com/6x7umBvZ
Quick summary is:

*RSRpcServices Close 19a02bdebe1cca4eae5509a62fdd217d, moving to null*
What does it mean, "moving to null"? It means put it offline and no RS will
serve it
HRegionServer
Received CLOSE for the region: 19a02bdebe1cca4eae5509a62fdd217d, which we
are already trying to CLOSE, but not completed yet


*Server node05.cluster.pro <http://node05.cluster.pro>,60020,1451388046169
returned
org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException:
*org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException:
The region 19a02bdebe1cca4eae5509a62fdd217d was already closing. New CLOSE
request is ignored.
at
org.apache.hadoop.hbase.regionserver.HRegionServer.closeRegion(HRegionServer.java:2689)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.closeRegion(RSRpcServices.java:1033)
at
org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:20870)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
at
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)
 for
my_weird_table,70a3d6f6,1448462346185.19a02bdebe1cca4eae5509a62fdd217d.,
try=1 of 10

Then I tried to repair table:

node03.cluster.pro INFO April 21, 2016 4:07 PM HRegion
Closed
my_weird_table,70a3d6f6,1448462346185.19a02bdebe1cca4eae5509a62fdd217d.
View Log File
node04.cluster.pro INFO May 6, 2016 10:09 PM MasterRpcServices
Client=hbase//148.251.186.9 assign
my_weird_table,70a3d6f6,1448462346185.19a02bdebe1cca4eae5509a62fdd217d.
View Log File
node05.cluster.pro INFO May 6, 2016 10:09 PM RSRpcServices
Close 19a02bdebe1cca4eae5509a62fdd217d, moving to null
View Log File
node04.cluster.pro INFO May 6, 2016 10:09 PM RegionStates
Transition {19a02bdebe1cca4eae5509a62fdd217d state=FAILED_CLOSE,
ts=1461242442267, server=node05.cluster.pro,60020,1451388046169} to
{19a02bdebe1cca4eae5509a62fdd217d state=OFFLINE, ts=1462561784994, server=
node05.cluster.pro,60020,1451388046169}

And it started to work.


2016-05-07 6:09 GMT+02:00 Chien Le <chienle@gmail.com>:

> I would try grepping for the region id (19a02bdebe1cca4eae5509a62fdd217d)
> through the logs of hbase master and the last known regionserver to host
> it. Can you share via pastebin?
>
> -Chien
>
> On Fri, May 6, 2016 at 12:18 PM, Serega Sheypak <serega.sheypak@gmail.com>
> wrote:
>
>> Hi, I have HBase cluster running on HBase 1.0.0-cdh5.4.4
>> I do periodically get NotServingRegionException and I can't find the
>> reason
>> for such exception. It happens randomly on different tables.
>>
>> *hbase hbase hbck my_weird_table*
>>
>> *reports*:
>>
>>
>> ERROR: Region { meta =>
>> my_weird_table,70a3d6f6,1448462346185.19a02bdebe1cca4eae5509a62fdd217d.,
>> hdfs =>
>>
>> hdfs://nameservice1/hbase/data/default/my_weird_table/19a02bdebe1cca4eae5509a62fdd217d,
>> deployed => , replicaId => 0 } not deployed on any region server.
>>
>> ERROR: Region { meta =>
>> my_weird_table,4f5c0e14,1447343972523.69fa4ad7a33868e938f25e5cbdb8cd08.,
>> hdfs =>
>>
>> hdfs://nameservice1/hbase/data/default/my_weird_table/69fa4ad7a33868e938f25e5cbdb8cd08,
>> deployed => , replicaId => 0 } not deployed on any region server.
>>
>> ERROR: Region { meta =>
>> my_weird_table,b0a3cf6:,1448475527400.d4c2bda6f776be97e369371fed1ea674.,
>> hdfs =>
>>
>> hdfs://nameservice1/hbase/data/default/my_weird_table/d4c2bda6f776be97e369371fed1ea674,
>> deployed => , replicaId => 0 } not deployed on any region server.
>>
>> 16/05/06 23:03:13 INFO util.HBaseFsck: Handling overlap merges in
>> parallel.
>> set hbasefsck.overlap.merge.parallel to false to run serially.
>>
>> ERROR: There is a hole in the region chain between 4f5c0e14 and 51eb8510.
>> You need to create a new .regioninfo and region dir in hdfs to plug the
>> hole.
>>
>> ERROR: There is a hole in the region chain between 70a3d6f6 and 73332fbb.
>> You need to create a new .regioninfo and region dir in hdfs to plug the
>> hole.
>>
>> ERROR: There is a hole in the region chain between b0a3cf6: and b3333313.
>> You need to create a new .regioninfo and region dir in hdfs to plug the
>> hole.
>>
>> 16/05/06 23:03:13 INFO util.HBaseFsck: Handling overlap merges in
>> parallel.
>> set hbasefsck.overlap.merge.parallel to false to run serially.
>>
>> ERROR: Found inconsistency in table my_weird_table
>>
>>
>>
>> *Summary:*
>>
>> *  hbase:meta is okay.*
>>
>> *    Number of regions: 1*
>>
>> *    Deployed on:  node05.cluster.pro
>> <http://node05.cluster.pro>,60020,1451388046169*
>>
>> *  my_weird_table is okay.*
>>
>> *    Number of regions: 98*
>>
>> *    Deployed on:  node01.cluster.pro
>> <http://node01.cluster.pro>,60020,1453774572201 node02. cluster.pro
>> <http://cluster.pro>,60020,1458087229508 node04. cluster.pro
>> <http://cluster.pro>,60020,1447338864601 node05. cluster.pro
>> <http://cluster.pro>,60020,1451388046169*
>>
>> *6 inconsistencies detected.*
>>
>>
>> *Status: INCONSISTENT*
>>
>> then I run* hbase hbase hbck -repair my_weird_table*
>>
>>
>> #### Output omitted for brevity.
>> 6/05/06 23:09:43 INFO util.HBaseFsck: No integrity errors.  We are done
>> with this phase. Glorious.
>> Number of live region servers: 5
>> Number of dead region servers: 0
>> Master: node04.cluster.pro,60000,1450130273717
>> Number of backup masters: 1
>> Average load: 167.8
>> Number of requests: 4884
>> Number of regions: 839
>> Number of regions in transition: 23
>>
>>
>> RROR: Region { meta =>
>> my_weird_table,70a3d6f6,1448462346185.19a02bdebe1cca4eae5509a62fdd217d.,
>> hdfs =>
>>
>> hdfs://nameservice1/hbase/data/default/my_weird_table/19a02bdebe1cca4eae5509a62fdd217d,
>> deployed => , replicaId => 0 } not deployed on any region server.
>> Trying to fix unassigned region...
>> 16/05/06 23:09:45 INFO util.HBaseFsckRepair: Region still in transition,
>> waiting for it to become assigned: {ENCODED =>
>> 19a02bdebe1cca4eae5509a62fdd217d, NAME =>
>> 'my_weird_table,70a3d6f6,1448462346185.19a02bdebe1cca4eae5509a62fdd217d.',
>> STARTKEY => '70a3d6f6', ENDKEY => '73332fbb'}
>> 16/05/06 23:09:46 INFO util.HBaseFsckRepair: Region still in transition,
>> waiting for it to become assigned: {ENCODED =>
>> 19a02bdebe1cca4eae5509a62fdd217d, NAME =>
>> 'my_weird_table,70a3d6f6,1448462346185.19a02bdebe1cca4eae5509a62fdd217d.',
>> STARTKEY => '70a3d6f6', ENDKEY => '73332fbb'}
>> 16/05/06 23:09:47 INFO util.HBaseFsckRepair: Region still in transition,
>> waiting for it to become assigned: {ENCODED =>
>> 19a02bdebe1cca4eae5509a62fdd217d, NAME =>
>> 'my_weird_table,70a3d6f6,1448462346185.19a02bdebe1cca4eae5509a62fdd217d.',
>> STARTKEY => '70a3d6f6', ENDKEY => '73332fbb'}
>> ERROR: Region { meta =>
>> my_weird_table,4f5c0e14,1447343972523.69fa4ad7a33868e938f25e5cbdb8cd08.,
>> hdfs =>
>>
>> hdfs://nameservice1/hbase/data/default/my_weird_table/69fa4ad7a33868e938f25e5cbdb8cd08,
>> deployed => , replicaId => 0 } not deployed on any region server.
>> Trying to fix unassigned region...
>> 16/05/06 23:09:48 INFO util.HBaseFsckRepair: Region still in transition,
>> waiting for it to become assigned: {ENCODED =>
>> 69fa4ad7a33868e938f25e5cbdb8cd08, NAME =>
>> 'my_weird_table,4f5c0e14,1447343972523.69fa4ad7a33868e938f25e5cbdb8cd08.',
>> STARTKEY => '4f5c0e14', ENDKEY => '51eb8510'}
>> 16/05/06 23:09:49 INFO util.HBaseFsckRepair: Region still in transition,
>> waiting for it to become assigned: {ENCODED =>
>> 69fa4ad7a33868e938f25e5cbdb8cd08, NAME =>
>> 'my_weird_table,4f5c0e14,1447343972523.69fa4ad7a33868e938f25e5cbdb8cd08.',
>> STARTKEY => '4f5c0e14', ENDKEY => '51eb8510'}
>> ERROR: Region { meta =>
>> my_weird_table,b0a3cf6:,1448475527400.d4c2bda6f776be97e369371fed1ea674.,
>> hdfs =>
>>
>> hdfs://nameservice1/hbase/data/default/my_weird_table/d4c2bda6f776be97e369371fed1ea674,
>> deployed => , replicaId => 0 } not deployed on any region server.
>> Trying to fix unassigned region...
>> 16/05/06 23:09:50 INFO util.HBaseFsckRepair: Region still in transition,
>> waiting for it to become assigned: {ENCODED =>
>> d4c2bda6f776be97e369371fed1ea674, NAME =>
>> 'my_weird_table,b0a3cf6:,1448475527400.d4c2bda6f776be97e369371fed1ea674.',
>> STARTKEY => 'b0a3cf6:', ENDKEY => 'b3333313'}
>> 16/05/06 23:09:51 INFO util.HBaseFsckRepair: Region still in transition,
>> waiting for it to become assigned: {ENCODED =>
>> d4c2bda6f776be97e369371fed1ea674, NAME =>
>> 'my_weird_table,b0a3cf6:,1448475527400.d4c2bda6f776be97e369371fed1ea674.',
>> STARTKEY => 'b0a3cf6:', ENDKEY => 'b3333313'}
>> 16/05/06 23:09:52 INFO util.HBaseFsck: Handling overlap merges in
>> parallel.
>> set hbasefsck.overlap.merge.parallel to false to run serially.
>> ERROR: There is a hole in the region chain between 4f5c0e14 and 51eb8510.
>> You need to create a new .regioninfo and region dir in hdfs to plug the
>> hole.
>> ERROR: There is a hole in the region chain between 70a3d6f6 and 73332fbb.
>> You need to create a new .regioninfo and region dir in hdfs to plug the
>> hole.
>> ERROR: There is a hole in the region chain between b0a3cf6: and b3333313.
>> You need to create a new .regioninfo and region dir in hdfs to plug the
>> hole.
>> 16/05/06 23:09:52 INFO util.HBaseFsck: Handling overlap merges in
>> parallel.
>> set hbasefsck.overlap.merge.parallel to false to run serially.
>> ERROR: Found inconsistency in table my_weird_table
>> 16/05/06 23:09:59 INFO zookeeper.RecoverableZooKeeper: Process
>> identifier=hbase Fsck connecting to ZooKeeper ensemble=
>> node04.cluster.pro:2181,node01.cluster.pro:2181,node05.cluster.pro:2181
>>
>> *16/05/06 23:09:59 INFO zookeeper.ClientCnxn: EventThread shut down*
>> *Summary:*
>> *  hbase:meta is okay.*
>> *    Number of regions: 1*
>> *    Deployed on:  node05.cluster.pro
>> <http://node05.cluster.pro>,60020,1451388046169*
>> *  my_weird_table is okay.*
>> *    Number of regions: 98*
>> *    Deployed on:  node01.cluster.pro
>> <http://node01.cluster.pro>,60020,1453774572201 node02.cluster.pro
>> <http://node02.cluster.pro>,60020,1458087229508 node04.cluster.pro
>> <http://node04.cluster.pro>,60020,1447338864601 node05.cluster.pro
>> <http://node05.cluster.pro>,60020,1451388046169*
>> *6 inconsistencies detected.*
>> *Status: INCONSISTENT*
>> 16/05/06 23:10:00 INFO util.HBaseFsck: Sleeping 10000ms before re-checking
>> after fix...
>> Version: 1.0.0-cdh5.4.4
>> 16/05/06 23:10:10 INFO util.HBaseFsck: Loading regioninfos HDFS
>> 16/05/06 23:10:10 INFO util.HBaseFsck: Loading HBase regioninfo from
>> HDFS...
>> 16/05/06 23:10:10 INFO util.HBaseFsck: Checking HBase region split map
>> from
>> HDFS data...
>> 16/05/06 23:10:10 INFO util.HBaseFsck: Handling overlap merges in
>> parallel.
>> set hbasefsck.overlap.merge.parallel to false to run serially.
>> 16/05/06 23:10:10 INFO util.HBaseFsck: Handling overlap merges in
>> parallel.
>> set hbasefsck.overlap.merge.parallel to false to run serially.
>> *16/05/06 23:10:10 INFO util.HBaseFsck: No integrity errors.  We are done
>> with this phase. Glorious.*
>> *Number of live region servers: 5*
>> *Number of dead region servers: 0*
>> *Master: node04.cluster.pro <http://node04.cluster.pro
>> >,60000,1450130273717*
>> *Number of backup masters: 1*
>> *Average load: 167.8*
>> *Number of requests: 4884*
>> *Number of regions: 839*
>> *Number of regions in transition: 23*
>>
>> 16/05/06 23:10:10 INFO util.HBaseFsck: Loading regionsinfo from the
>> hbase:meta table
>>
>> Number of empty REGIONINFO_QUALIFIER rows in hbase:meta: 0
>> 16/05/06 23:10:11 INFO util.HBaseFsck: getHTableDescriptors == tableNames
>> => [my_weird_table]
>> 1Number of Tables: 1
>>
>> *Summary:*
>> *  hbase:meta is okay.16/05/06 23:10:18 INFO zookeeper.ClientCnxn:
>> EventThread shut down*
>>
>> *    Number of regions: 1*
>> *    Deployed on:  node05.cluster.pro
>> <http://node05.cluster.pro>,60020,1451388046169*
>> *  my_weird_table is okay.*
>> *    Number of regions: 101*
>> *    Deployed on:  node01.cluster.pro
>> <http://node01.cluster.pro>,60020,1453774572201 node02.cluster.pro
>> <http://node02.cluster.pro>,60020,1458087229508 node03.cluster.pro
>> <http://node03.cluster.pro>,60020,1461244112276 node04.cluster.pro
>> <http://node04.cluster.pro>,60020,1447338864601 node05.cluster.pro
>> <http://node05.cluster.pro>,60020,1451388046169*
>> *0 inconsistencies detected.*
>> *Status: OK*
>>
>>
>> So, why do some regions are not served?
>>
>> Why does -repair helps? What makes my table to be broken and partially
>> unavailable?
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message