Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4FD4218B78 for ; Tue, 13 Oct 2015 08:41:09 +0000 (UTC) Received: (qmail 11582 invoked by uid 500); 13 Oct 2015 08:41:01 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 11517 invoked by uid 500); 13 Oct 2015 08:41:01 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 11505 invoked by uid 99); 13 Oct 2015 08:41:01 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Oct 2015 08:41:01 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id EC5EB1A203B for ; Tue, 13 Oct 2015 08:41:00 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.991 X-Spam-Level: X-Spam-Status: No, score=0.991 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id j38HaRh7n0uj for ; Tue, 13 Oct 2015 08:40:54 +0000 (UTC) Received: from mail.openindex.io (mail.openindex.io [178.21.113.82]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTP id 27C97439DD for ; Tue, 13 Oct 2015 08:40:54 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mail.openindex.io (Postfix) with ESMTP id BD1BE3CE27E for ; Tue, 13 Oct 2015 08:40:52 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at mail.openindex.io Received: from mail.openindex.io ([127.0.0.1]) by localhost (mail.openindex.io [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id S8VPzLCGPadT for ; Tue, 13 Oct 2015 08:40:52 +0000 (UTC) Received: from [192.168.0.126] (D5225C2E.static.ziggozakelijk.nl [213.34.92.46]) (Authenticated sender: jurian.broertjes@openindex.io) by mail.openindex.io (Postfix) with ESMTPA id 045733CE27D for ; Tue, 13 Oct 2015 08:40:52 +0000 (UTC) Message-ID: <561CC38F.6040604@openindex.io> Date: Tue, 13 Oct 2015 10:40:47 +0200 From: Jurian Broertjes User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.8.0 MIME-Version: 1.0 To: user@hbase.apache.org Subject: Re: Regions won't come online References: <561BCA0F.1020503@openindex.io> <5D3C1DA1-06D6-4F28-99BC-5F5CE182B682@gmail.com> In-Reply-To: <5D3C1DA1-06D6-4F28-99BC-5F5CE182B682@gmail.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit I'm not exactly sure what happened during the night, but the problem somehow resolved itself after a few hours. The regionservers kept spewing NotServingRegionExceptions for about 2h and then all of a sudden a few of the following exceptions came by, after which the problems dissapeared. 2015-10-12 18:16:56,616 ERROR [RS_OPEN_REGION-cn1:16020-2] handler.OpenRegionHandler: Failed open of region=QUERYLOGS,\x 01,1440751079220.87e5c32c06d42524fd0876eb72b1472e., starting to roll back the global memstore size. org.apache.phoenix.hbase.index.exception.MultiIndexWriteFailureException: Failed to write to multiple index tables at org.apache.phoenix.hbase.index.write.recovery.TrackingParallelWriterIndexCommitter.write(TrackingParallelWrit erIndexCommitter.java:222) at org.apache.phoenix.hbase.index.write.IndexWriter.write(IndexWriter.java:179) at org.apache.phoenix.hbase.index.write.IndexWriter.write(IndexWriter.java:169) at org.apache.phoenix.hbase.index.Indexer.preWALRestore(Indexer.java:545) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$58.call(RegionCoprocessorHost.java:1432) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:16 73) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1748) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1705) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preWALRestore(RegionCoprocessorHost.java:1423) at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4013) at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:3869) at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:937) at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:807) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:782) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6227) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6188) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6159) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6115) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6066) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:362) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:129) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) I still don't know if it's Phoenix or HBase related, but for now I'm glad it all seems resolved. If somebody has any clue what happened or if something should be done to prevent a situation like this, please let me know. Best regards, Jurian On 10/12/2015 05:26 PM, Ted Yu wrote: > Have you checked master to see if region assignment went okay ? > > Cheers > >> On Oct 12, 2015, at 7:56 AM, Jurian Broertjes wrote: >> >> Hi all, >> >> I'm using hbase (1.1.2) with phoenix (4.5.2-HBase-1.1) and had some (minor) HDFS issues. The HDFS issues are resolved and when I try to bring HBase back up, I run into issues where some regions won't come online. >> >> Some RS log: >> 2015-10-12 14:08:54,681 INFO [cn1.xxx.xx,16020,1444658908121-recovery-writer--pool6-t1] client.AsyncProcess: #8, waiting for 1 actions to finish >> 2015-10-12 14:08:55,781 INFO [cn1.xxx.xx,16020,1444658908121-recovery-writer--pool6-t2] client.AsyncProcess: #13, waiting for 1 actions to finish >> 2015-10-12 14:08:55,819 INFO [cn1.xxx.xx,16020,1444658908121-recovery-writer--pool6-t3] client.AsyncProcess: #15, waiting for 2 actions to finish >> 2015-10-12 14:08:59,119 INFO [cn1.xxx.xx,16020,1444658908121-recovery-writer--pool6-t4] client.AsyncProcess: #24, waiting for 1 actions to finish >> 2015-10-12 14:08:59,138 INFO [cn1.xxx.xx,16020,1444658908121-recovery-writer--pool6-t5] client.AsyncProcess: #25, waiting for 2 actions to finish >> 2015-10-12 14:09:04,692 INFO [cn1.xxx.xx,16020,1444658908121-recovery-writer--pool6-t1] client.AsyncProcess: #8, waiting for 1 actions to finish >> 2015-10-12 14:09:05,793 INFO [cn1.xxx.xx,16020,1444658908121-recovery-writer--pool6-t2] client.AsyncProcess: #13, waiting for 1 actions to finish >> 2015-10-12 14:09:05,831 INFO [cn1.xxx.xx,16020,1444658908121-recovery-writer--pool6-t3] client.AsyncProcess: #15, waiting for 2 actions to finish >> 2015-10-12 14:09:07,214 INFO [regionserver/cn1.xxx.xx/89.188.14.2:16020-shortCompactions-1444658915963] client.AsyncProcess: #23, waiting for some tasks to finish. Expected max=0, tasksInProgress=9 >> 2015-10-12 14:09:09,131 INFO [cn1.xxx.xx,16020,1444658908121-recovery-writer--pool6-t4] client.AsyncProcess: #24, waiting for 1 actions to finish >> 2015-10-12 14:09:09,150 INFO [cn1.xxx.xx,16020,1444658908121-recovery-writer--pool6-t5] client.AsyncProcess: #25, waiting for 2 actions to finish >> 2015-10-12 14:09:12,945 INFO [htable-pool8-t1] client.AsyncProcess: #8, table=OUTLINKS_SSI_INDEX, attempt=10/350 failed=1ops, last exception: org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region OUTLINKS_SSI_INDEX,,1440761791894.c9cfcf16be9852553efe45e36387a4b1. is not online on cn1.xxx.xx,16020,1444658908121 >> at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2898) >> at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:947) >> at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1991) >> at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213) >> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114) >> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101) >> at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) >> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) >> at java.lang.Thread.run(Thread.java:745) >> on cn1.xxx.xx,16020,1444654232108, tracking started null, retrying after=10086ms, replay=1ops >> >> The cluster consists of 2 masters and 3 region servers and an external Zookeeper. >> >> Anyone knows what's going on here? >> >> Thanks in advance, >> >> Jurian