Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 96A011007D for ; Sun, 28 Jul 2013 01:35:24 +0000 (UTC) Received: (qmail 6878 invoked by uid 500); 28 Jul 2013 01:35:23 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 6819 invoked by uid 500); 28 Jul 2013 01:35:23 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 6811 invoked by uid 99); 28 Jul 2013 01:35:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 28 Jul 2013 01:35:23 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of vrodionov@carrieriq.com designates 204.235.122.16 as permitted sender) Received: from [204.235.122.16] (HELO obmail.carrieriq.com) (204.235.122.16) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 28 Jul 2013 01:35:19 +0000 From: Vladimir Rodionov To: "dev@hbase.apache.org" Date: Sat, 27 Jul 2013 18:33:23 -0700 Subject: RE: Master aborts on start up - URGENT Thread-Topic: Master aborts on start up - URGENT Thread-Index: AQHOix/3AkLiIlhBGEGHkSsEm/u9gJl5NZ3HgAAMHiSAAAzq9g== Message-ID: References: ,, In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US x-kse-antivirus-interceptor-info: fallback Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org OK, that was my issue.=20 All RS failed to create table because we do not have SNAPPY support. RS fail to create table, but Master should not abort in this case.=20 Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodionov@carrieriq.com ________________________________________ From: Vladimir Rodionov Sent: Saturday, July 27, 2013 5:47 PM To: dev@hbase.apache.org Subject: RE: Master aborts on start up - URGENT Nope. this seems to be very serious issue When I tried to recreate 'usertable' I got the same issue again: 2013-07-28 00:35:40,747 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: m= aster:60000-0x54022944d180000 Creating (or updating) unassigned node for a3= 86becc8860c810e33bb9c9d81482bc with OFFLINE state 2013-07-28 00:35:40,747 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IP= C Server Responder 2013-07-28 00:35:40,747 DEBUG org.apache.hadoop.hbase.master.AssignmentMana= ger: Found an existing plan for usertable,user,1374971740436.cf772fe9e49bc9= 11024b442914a15f67. destination server is sjc1-eng-perf-g1-grid04.carrieriq= .com,60020,1374969681440 2013-07-28 00:35:40,748 DEBUG org.apache.hadoop.hbase.master.AssignmentMana= ger: No previous transition plan was found (or we are ignoring an existing = plan) for usertable,user,1374971740436.cf772fe9e49bc911024b442914a15f67. so= generated a random one; hri=3Dusertable,user,1374971740436.cf772fe9e49bc91= 1024b442914a15f67., src=3D, dest=3Dsjc1-eng-perf-g1-grid19.carrieriq.com,60= 020,1374969681450; 20 (online=3D20, available=3D19) available servers 2013-07-28 00:35:40,748 INFO org.mortbay.log: Stopped SelectChannelConnecto= r@0.0.0.0:60010 2013-07-28 00:35:40,749 DEBUG org.apache.hadoop.hbase.master.handler.Closed= RegionHandler: Handling CLOSED event for 16938dcb9c3bb52a46ffb7b10fab3c57 2013-07-28 00:35:40,749 FATAL org.apache.hadoop.hbase.master.HMaster: Maste= r server abort: loaded coprocessors are: [] 2013-07-28 00:35:40,749 DEBUG org.apache.hadoop.hbase.master.AssignmentMana= ger: Forcing OFFLINE; was=3Dusertable,user7,1374971740436.16938dcb9c3bb52a4= 6ffb7b10fab3c57. state=3DCLOSED, ts=3D1374971740713, server=3Dsjc1-eng-perf= -g1-grid01.carrieriq.com,60020,1374969681434 2013-07-28 00:35:40,749 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: m= aster:60000-0x54022944d180000 Creating (or updating) unassigned node for 16= 938dcb9c3bb52a46ffb7b10fab3c57 with OFFLINE state 2013-07-28 00:35:40,749 FATAL org.apache.hadoop.hbase.master.HMaster: Unexp= ected state : usertable,user6,1374971740436.73e01b52a570febc16833d7cc4f7ca4= 8. state=3DPENDING_OPEN, ts=3D1374971740749, server=3Dsjc1-eng-perf-g1-grid= 03.carrieriq.com,60020,1374969681445 .. Cannot transit it to OFFLINE. java.lang.IllegalStateException: Unexpected state : usertable,user6,1374971= 740436.73e01b52a570febc16833d7cc4f7ca48. state=3DPENDING_OPEN, ts=3D1374971= 740749, server=3Dsjc1-eng-perf-g1-grid03.carrieriq.com,60020,1374969681445 = .. Cannot transit it to OFFLINE. at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZoo= Keeper(AssignmentManager.java:1820) at org.apache.hadoop.hbase.master.AssignmentManager.assign(Assignme= ntManager.java:1659) at org.apache.hadoop.hbase.master.AssignmentManager.assign(Assignme= ntManager.java:1424) at org.apache.hadoop.hbase.master.AssignmentManager.assign(Assignme= ntManager.java:1399) at org.apache.hadoop.hbase.master.AssignmentManager.assign(Assignme= ntManager.java:1394) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.proce= ss(ClosedRegionHandler.java:105) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.j= ava:175) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoo= lExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExe= cutor.java:908) at java.lang.Thread.run(Thread.java:662) 2013-07-28 00:35:40,749 INFO org.apache.hadoop.hbase.master.HMaster: Aborti= ng Master aborted. This is what I ran: create 'usertable', { NAME=3D>'cf', VERSIONS=3D> 1, COMPRESSION =3D> 'SNAPP= Y', BLOCKCACHE =3D> true}, { SPLITS =3D> ['user', 'user05', 'user1','user15= ','user2','user25','user3','user35','user4','user45','user5','user55','user= 6','user65','user7','user75','user8','user85','user9','user95' ]} Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodionov@carrieriq.com ________________________________________ From: Vladimir Rodionov Sent: Saturday, July 27, 2013 5:08 PM To: dev@hbase.apache.org Subject: RE: Master aborts on start up - URGENT OK, I managed to fix the issue and minimize the damage. The reason why OfflineMetaRepair failed to fix .META. was because there wer= e inconsistencies in one of the tables and the tool refused to do META repair. I had to physically remove this tab= le in HDFS and then I re-ran the tool and successfully repaired META. table and Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodionov@carrieriq.com ________________________________________ From: Vladimir Rodionov Sent: Saturday, July 27, 2013 4:21 PM To: dev@hbase.apache.org Subject: Master aborts on start up - URGENT This may be related to : https://issues.apache.org/jira/browse/HBASE-8912 It has started when I tried to install and run YCSB. I have created 'userta= ble' and then tried to modify it couple times (added COMPRESSION), HBase (0.94.6) stopped working (Master could not finish initialization) I stopped the cluster and physically removed /hbase/usertable directory as = well as all ZK local stores. Restarted. No success. I manually ran OfflineMetaRepair. Restarted. No success. This is FATAL err= or in Master's log file. For some reason, OfflineMetaRepair did not fix missing 'usertable'. Please, advise. This is a development cluster with a large volume of data. 2013-07-27 23:08:56,504 DEBUG org.apache.hadoop.hbase.master.AssignmentMana= ger: The znode of region TMO_NOV_INDEX-UPLOADS,38,1360181215845.2553b53773e= 3cb9030c3248768a3b0ca. has been deleted. 2013-07-27 23:08:56,504 INFO org.apache.hadoop.hbase.master.AssignmentManag= er: The master has opened the region TMO_NOV_INDEX-UPLOADS,38,1360181215845= .2553b53773e3cb9030c3248768a3b0ca. that was online on sjc1-eng-perf-g1-grid= 06.carrieriq.com,60020,1374966494222 2013-07-27 23:08:56,504 FATAL org.apache.hadoop.hbase.master.HMaster: Unexp= ected state : usertable,,1374962208806.249881162b6ad6d084b30507283f98b8. st= ate=3DPENDING_OPEN, ts=3D1374966536502, server=3Dsjc1-eng-perf-g1-grid14.ca= rrieriq.com,60020,1374966494232 .. Cannot transit it to OFFLINE. java.lang.IllegalStateException: Unexpected state : usertable,,137496220880= 6.249881162b6ad6d084b30507283f98b8. state=3DPENDING_OPEN, ts=3D137496653650= 2, server=3Dsjc1-eng-perf-g1-grid14.carrieriq.com,60020,1374966494232 .. Ca= nnot transit it to OFFLINE. at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZoo= Keeper(AssignmentManager.java:1820) at org.apache.hadoop.hbase.master.AssignmentManager.assign(Assignme= ntManager.java:1659) at org.apache.hadoop.hbase.master.AssignmentManager.assign(Assignme= ntManager.java:1424) at org.apache.hadoop.hbase.master.AssignmentManager.assign(Assignme= ntManager.java:1399) at org.apache.hadoop.hbase.master.AssignmentManager.assign(Assignme= ntManager.java:1394) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.proce= ss(ClosedRegionHandler.java:105) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.j= ava:175) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoo= lExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExe= cutor.java:908) at java.lang.Thread.run(Thread.java:662) 2013-07-27 23:08:56,504 DEBUG org.apache.hadoop.hbase.master.AssignmentMana= ger: The znode of region TMO_NOV_INDEX-UPLOADS,46,1360181215846.6f2a2eb3924= ba5cb6ed22f966e6356e8. has been deleted. 2013-07-27 23:08:56,505 INFO org.apache.hadoop.hbase.master.HMaster: Aborti= ng Best regards, Vladimir Rodionov Principal Platform Engineer Carrier IQ, www.carrieriq.com e-mail: vrodionov@carrieriq.com ________________________________________ From: stack (JIRA) [jira@apache.org] Sent: Saturday, July 27, 2013 3:21 PM To: dev@hbase.apache.org Subject: [jira] [Created] (HBASE-9063) TestAssignmentManagerOnCluster.testS= SHWhenDisablingTableRegionsInOpeningOrPendingOpenState fails stack created HBASE-9063: ---------------------------- Summary: TestAssignmentManagerOnCluster.testSSHWhenDisablingTa= bleRegionsInOpeningOrPendingOpenState fails Key: HBASE-9063 URL: https://issues.apache.org/jira/browse/HBASE-9063 Project: HBase Issue Type: Bug Components: test Reporter: stack Assignee: Jimmy Xiang https://builds.apache.org/job/hbase-0.95-on-hadoop2/200/testReport/org.apac= he.hadoop.hbase.master/TestAssignmentManagerOnCluster/testSSHWhenDisablingT= ableRegionsInOpeningOrPendingOpenState/ {code}java.lang.NullPointerException at org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(As= signmentManager.java:1314) at org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster.te= stSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState(TestAssignmentMan= agerOnCluster.java:482) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessor= Impl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethod= AccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(Fram= eworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(Reflecti= veCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(Framew= orkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(Invo= keMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThr= ead.run(FailOnTimeout.java:74){code} Hope you don't mind my assigning it to you Jimmy. Thought you might be int= erested. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs For more information on JIRA, see: http://www.atlassian.com/software/jira Confidentiality Notice: The information contained in this message, includi= ng any attachments hereto, may be confidential and is intended to be read o= nly by the individual or entity to whom this message is addressed. If the r= eader of this message is not the intended recipient or an agent or designee= of the intended recipient, please note that any review, use, disclosure or= distribution of this message or its attachments, in any form, is strictly = prohibited. If you have received this message in error, please immediately= notify the sender and/or Notifications@carrieriq.com and delete or destroy= any copy of this message and its attachments.