hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Rodionov <vrodio...@carrieriq.com>
Subject RE: Master aborts on start up - URGENT
Date Sun, 28 Jul 2013 00:47:09 GMT
Nope. this seems to be very serious issue

When I tried to recreate 'usertable' I got the same issue again:


2013-07-28 00:35:40,747 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x54022944d180000
Creating (or updating) unassigned node for a386becc8860c810e33bb9c9d81482bc with OFFLINE state
2013-07-28 00:35:40,747 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server Responder
2013-07-28 00:35:40,747 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Found an existing
plan for usertable,user,1374971740436.cf772fe9e49bc911024b442914a15f67. destination server
is sjc1-eng-perf-g1-grid04.carrieriq.com,60020,1374969681440
2013-07-28 00:35:40,748 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous
transition plan was found (or we are ignoring an existing plan) for usertable,user,1374971740436.cf772fe9e49bc911024b442914a15f67.
so generated a random one; hri=usertable,user,1374971740436.cf772fe9e49bc911024b442914a15f67.,
src=, dest=sjc1-eng-perf-g1-grid19.carrieriq.com,60020,1374969681450; 20 (online=20, available=19)
available servers
2013-07-28 00:35:40,748 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:60010
2013-07-28 00:35:40,749 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler:
Handling CLOSED event for 16938dcb9c3bb52a46ffb7b10fab3c57
2013-07-28 00:35:40,749 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort:
loaded coprocessors are: []
2013-07-28 00:35:40,749 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE;
was=usertable,user7,1374971740436.16938dcb9c3bb52a46ffb7b10fab3c57. state=CLOSED, ts=1374971740713,
server=sjc1-eng-perf-g1-grid01.carrieriq.com,60020,1374969681434
2013-07-28 00:35:40,749 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x54022944d180000
Creating (or updating) unassigned node for 16938dcb9c3bb52a46ffb7b10fab3c57 with OFFLINE state
2013-07-28 00:35:40,749 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected state : usertable,user6,1374971740436.73e01b52a570febc16833d7cc4f7ca48.
state=PENDING_OPEN, ts=1374971740749, server=sjc1-eng-perf-g1-grid03.carrieriq.com,60020,1374969681445
.. Cannot transit it to OFFLINE.
java.lang.IllegalStateException: Unexpected state : usertable,user6,1374971740436.73e01b52a570febc16833d7cc4f7ca48.
state=PENDING_OPEN, ts=1374971740749, server=sjc1-eng-perf-g1-grid03.carrieriq.com,60020,1374969681445
.. Cannot transit it to OFFLINE.
        at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1820)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1659)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394)
        at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
2013-07-28 00:35:40,749 INFO org.apache.hadoop.hbase.master.HMaster: Aborting


Master aborted.

This is what I ran:

create 'usertable', { NAME=>'cf', VERSIONS=> 1, COMPRESSION => 'SNAPPY', BLOCKCACHE
=> true}, { SPLITS => ['user', 'user05', 'user1','user15','user2','user25','user3','user35','user4','user45','user5','user55','user6','user65','user7','user75','user8','user85','user9','user95'
]} 

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Vladimir Rodionov
Sent: Saturday, July 27, 2013 5:08 PM
To: dev@hbase.apache.org
Subject: RE: Master aborts on start up - URGENT

OK, I managed to fix the issue and minimize the damage.

The reason why OfflineMetaRepair failed to fix .META. was because there were inconsistencies
in one of the tables
and the tool refused to do META repair. I had to physically remove this table in HDFS and
then I re-ran the tool
and successfully repaired META.



table and
Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Vladimir Rodionov
Sent: Saturday, July 27, 2013 4:21 PM
To: dev@hbase.apache.org
Subject: Master aborts on start up - URGENT

This may be related to :

https://issues.apache.org/jira/browse/HBASE-8912


It has started when I tried to install and run YCSB. I have created 'usertable' and then tried
to modify it couple times (added COMPRESSION),
HBase (0.94.6) stopped working (Master could not finish initialization)

I stopped the cluster and physically removed /hbase/usertable directory as well as all ZK
local stores. Restarted. No success.

I manually ran OfflineMetaRepair. Restarted. No success. This is  FATAL error in Master's
log file.

For some reason, OfflineMetaRepair did not fix missing 'usertable'.

Please, advise. This is a development cluster with a large volume of data.



2013-07-27 23:08:56,504 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode
of region TMO_NOV_INDEX-UPLOADS,38,1360181215845.2553b53773e3cb9030c3248768a3b0ca. has been
deleted.
2013-07-27 23:08:56,504 INFO org.apache.hadoop.hbase.master.AssignmentManager: The master
has opened the region TMO_NOV_INDEX-UPLOADS,38,1360181215845.2553b53773e3cb9030c3248768a3b0ca.
that was online on sjc1-eng-perf-g1-grid06.carrieriq.com,60020,1374966494222
2013-07-27 23:08:56,504 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected state : usertable,,1374962208806.249881162b6ad6d084b30507283f98b8.
state=PENDING_OPEN, ts=1374966536502, server=sjc1-eng-perf-g1-grid14.carrieriq.com,60020,1374966494232
.. Cannot transit it to OFFLINE.
java.lang.IllegalStateException: Unexpected state : usertable,,1374962208806.249881162b6ad6d084b30507283f98b8.
state=PENDING_OPEN, ts=1374966536502, server=sjc1-eng-perf-g1-grid14.carrieriq.com,60020,1374966494232
.. Cannot transit it to OFFLINE.
        at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1820)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1659)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394)
        at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
2013-07-27 23:08:56,504 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode
of region TMO_NOV_INDEX-UPLOADS,46,1360181215846.6f2a2eb3924ba5cb6ed22f966e6356e8. has been
deleted.
2013-07-27 23:08:56,505 INFO org.apache.hadoop.hbase.master.HMaster: Aborting


Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: stack (JIRA) [jira@apache.org]
Sent: Saturday, July 27, 2013 3:21 PM
To: dev@hbase.apache.org
Subject: [jira] [Created] (HBASE-9063) TestAssignmentManagerOnCluster.testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState
fails

stack created HBASE-9063:
----------------------------

             Summary: TestAssignmentManagerOnCluster.testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState
fails
                 Key: HBASE-9063
                 URL: https://issues.apache.org/jira/browse/HBASE-9063
             Project: HBase
          Issue Type: Bug
          Components: test
            Reporter: stack
            Assignee: Jimmy Xiang


https://builds.apache.org/job/hbase-0.95-on-hadoop2/200/testReport/org.apache.hadoop.hbase.master/TestAssignmentManagerOnCluster/testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState/

{code}java.lang.NullPointerException
        at org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:1314)
        at org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster.testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState(TestAssignmentManagerOnCluster.java:482)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74){code}

Hope you don't mind my assigning it to you Jimmy.  Thought you might be interested.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Confidentiality Notice:  The information contained in this message, including any attachments
hereto, may be confidential and is intended to be read only by the individual or entity to
whom this message is addressed. If the reader of this message is not the intended recipient
or an agent or designee of the intended recipient, please note that any review, use, disclosure
or distribution of this message or its attachments, in any form, is strictly prohibited. 
If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com
and delete or destroy any copy of this message and its attachments.

Mime
View raw message