Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Message-ID: <561CC38F.6040604@openindex.io>
Date: Tue, 13 Oct 2015 10:40:47 +0200
From: Jurian Broertjes <jurian.broertjes@openindex.io>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:31.0) Gecko/20100101 Icedove/31.8.0
MIME-Version: 1.0
To: user@hbase.apache.org
Subject: Re: Regions won't come online
References: <561BCA0F.1020503@openindex.io>
 <5D3C1DA1-06D6-4F28-99BC-5F5CE182B682@gmail.com>
In-Reply-To: <5D3C1DA1-06D6-4F28-99BC-5F5CE182B682@gmail.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit

I'm not exactly sure what happened during the night, but the problem 
somehow resolved itself after a few hours. The regionservers kept 
spewing NotServingRegionExceptions for about 2h and then all of a sudden 
a few of the following exceptions came by, after which the problems 
dissapeared.

2015-10-12 18:16:56,616 ERROR [RS_OPEN_REGION-cn1:16020-2] 
handler.OpenRegionHandler: Failed open of region=QUERYLOGS,\x
01,1440751079220.87e5c32c06d42524fd0876eb72b1472e., starting to roll 
back the global memstore size.
org.apache.phoenix.hbase.index.exception.MultiIndexWriteFailureException: Failed 
to write to multiple index tables
         at 
org.apache.phoenix.hbase.index.write.recovery.TrackingParallelWriterIndexCommitter.write(TrackingParallelWrit
erIndexCommitter.java:222)
         at 
org.apache.phoenix.hbase.index.write.IndexWriter.write(IndexWriter.java:179)
         at 
org.apache.phoenix.hbase.index.write.IndexWriter.write(IndexWriter.java:169)
         at 
org.apache.phoenix.hbase.index.Indexer.preWALRestore(Indexer.java:545)
         at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$58.call(RegionCoprocessorHost.java:1432)
         at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:16
73)
         at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1748)
         at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1705)
         at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preWALRestore(RegionCoprocessorHost.java:1423)
         at 
org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4013)
         at 
org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:3869)
         at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:937)
         at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:807)
         at 
org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:782)
         at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6227)
         at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6188)
         at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6159)
         at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6115)
         at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6066)
         at 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:362)
         at 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:129)
         at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
         at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
         at java.lang.Thread.run(Thread.java:745)

I still don't know if it's Phoenix or HBase related, but for now I'm 
glad it all seems resolved. If somebody has any clue what happened or if 
something should be done to prevent a situation like this, please let me 
know.

Best regards,
Jurian


On 10/12/2015 05:26 PM, Ted Yu wrote:
> Have you checked master to see if region assignment went okay ?
>
> Cheers
>
>> On Oct 12, 2015, at 7:56 AM, Jurian Broertjes <jurian.broertjes@openindex.io> wrote:
>>
>> Hi all,
>>
>> I'm using hbase (1.1.2) with phoenix (4.5.2-HBase-1.1) and had some (minor) HDFS issues. The HDFS issues are resolved and when I try to bring HBase back up, I run into issues where some regions won't come online.
>>
>> Some RS log:
>> 2015-10-12 14:08:54,681 INFO [cn1.xxx.xx,16020,1444658908121-recovery-writer--pool6-t1] client.AsyncProcess: #8, waiting for 1  actions to finish
>> 2015-10-12 14:08:55,781 INFO [cn1.xxx.xx,16020,1444658908121-recovery-writer--pool6-t2] client.AsyncProcess: #13, waiting for 1  actions to finish
>> 2015-10-12 14:08:55,819 INFO [cn1.xxx.xx,16020,1444658908121-recovery-writer--pool6-t3] client.AsyncProcess: #15, waiting for 2  actions to finish
>> 2015-10-12 14:08:59,119 INFO [cn1.xxx.xx,16020,1444658908121-recovery-writer--pool6-t4] client.AsyncProcess: #24, waiting for 1  actions to finish
>> 2015-10-12 14:08:59,138 INFO [cn1.xxx.xx,16020,1444658908121-recovery-writer--pool6-t5] client.AsyncProcess: #25, waiting for 2  actions to finish
>> 2015-10-12 14:09:04,692 INFO [cn1.xxx.xx,16020,1444658908121-recovery-writer--pool6-t1] client.AsyncProcess: #8, waiting for 1  actions to finish
>> 2015-10-12 14:09:05,793 INFO [cn1.xxx.xx,16020,1444658908121-recovery-writer--pool6-t2] client.AsyncProcess: #13, waiting for 1  actions to finish
>> 2015-10-12 14:09:05,831 INFO [cn1.xxx.xx,16020,1444658908121-recovery-writer--pool6-t3] client.AsyncProcess: #15, waiting for 2  actions to finish
>> 2015-10-12 14:09:07,214 INFO [regionserver/cn1.xxx.xx/89.188.14.2:16020-shortCompactions-1444658915963] client.AsyncProcess: #23, waiting for some tasks to finish. Expected max=0, tasksInProgress=9
>> 2015-10-12 14:09:09,131 INFO [cn1.xxx.xx,16020,1444658908121-recovery-writer--pool6-t4] client.AsyncProcess: #24, waiting for 1  actions to finish
>> 2015-10-12 14:09:09,150 INFO [cn1.xxx.xx,16020,1444658908121-recovery-writer--pool6-t5] client.AsyncProcess: #25, waiting for 2  actions to finish
>> 2015-10-12 14:09:12,945 INFO  [htable-pool8-t1] client.AsyncProcess: #8, table=OUTLINKS_SSI_INDEX, attempt=10/350 failed=1ops, last exception: org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region OUTLINKS_SSI_INDEX,,1440761791894.c9cfcf16be9852553efe45e36387a4b1. is not online on cn1.xxx.xx,16020,1444658908121
>>   at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2898)
>>   at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:947)
>>   at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1991)
>>   at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213)
>>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
>>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
>>   at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>>   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>>   at java.lang.Thread.run(Thread.java:745)
>> on cn1.xxx.xx,16020,1444654232108, tracking started null, retrying after=10086ms, replay=1ops
>>
>> The cluster consists of 2 masters and 3 region servers and an external Zookeeper.
>>
>> Anyone knows what's going on here?
>>
>> Thanks in advance,
>>
>> Jurian