Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8EADC1849C for ; Wed, 1 Jul 2015 13:06:40 +0000 (UTC) Received: (qmail 43581 invoked by uid 500); 1 Jul 2015 13:06:37 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 43514 invoked by uid 500); 1 Jul 2015 13:06:37 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 43502 invoked by uid 99); 1 Jul 2015 13:06:37 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Jul 2015 13:06:37 +0000 X-ASF-Spam-Status: No, hits=3.5 required=5.0 tests=FORGED_HOTMAIL_RCVD2,SPF_SOFTFAIL,URI_HEX X-Spam-Check-By: apache.org Received-SPF: softfail (nike.apache.org: transitioning domain of mikecmills@hotmail.com does not designate 162.253.133.43 as permitted sender) Received: from [162.253.133.43] (HELO mwork.nabble.com) (162.253.133.43) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Jul 2015 13:04:18 +0000 Received: from mben.nabble.com (unknown [162.253.133.72]) by mwork.nabble.com (Postfix) with ESMTP id 65ABC2259AA7 for ; Wed, 1 Jul 2015 06:06:05 -0700 (PDT) Date: Wed, 1 Jul 2015 06:06:03 -0700 (MST) From: MiMills To: user@hbase.apache.org Message-ID: <1435755963949-4072787.post@n3.nabble.com> Subject: Corrupted META? MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Our small production cluster went down last night and we need help recovering it. Any recommendations on what to try next? HBase: 96.1.1 Hadoop: 2.2 (HA) We have only three stores (yes we know that is too few and are about to ramp up): server1, server2, server3. Replication is set to 3. Yes, not ideal, but we have been taking one of the three servers on and offline in the past and hbase handled it well. We took server2 offline to reinstall its OS last night. During that time the cluster went down. When we brought it back up the master-status web UI showed: * server1 and server3 listed as Dead Region Servers. We need to recover the data in those regions on those boxes. * server1 listed under "Regions in Transition: 1588230740 hbase:meta,,1.1588230740 state=FAILED_OPEN, ts=Wed Jul 01 04:39:43 CDT 2015 (7507s ago), server=server1.corp.gs.com,60020,1435719306878 * None of our tables were listed on the main page but they appear on the Table Details page. We finished updating Server2 and started it and started Server3 via ./hbase-daemon.sh start region server. It didn't help. They do appear in the UI, but their Num. Regions is 0. Running "hadoop fsck /" shows no hdfs issues. We can still see our tables and regions in hdfs. Running "hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair" made no difference and reported: 2015-07-01 08:01:02,272 WARN [main] util.HBaseFsck: Unable to read .tableinfo from hdfs://gs1/hbase org.apache.hadoop.hbase.TableInfoMissingException: No table descriptor file under hdfs://gs1/hbase/data/hbase/meta at org.apache.hadoop.hbase.util.FSTableDescriptors.getTableDescriptorFromFs(FSTableDescriptors.java:481) at org.apache.hadoop.hbase.util.FSTableDescriptors.getTableDescriptorFromFs(FSTableDescriptors.java:469) at org.apache.hadoop.hbase.util.HBaseFsck.loadHdfsRegionInfos(HBaseFsck.java:836) at org.apache.hadoop.hbase.util.HBaseFsck.rebuildMeta(HBaseFsck.java:1059) at org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair.main(OfflineMetaRepair.java:118) ERROR: Unable to read .tableinfo from hdfs://gs1/hbase/hbase:meta 2015-07-01 08:01:02,589 INFO [main] util.HBaseFsck: Checking HBase region split map from HDFS data... 2015-07-01 08:01:03,130 INFO [main] util.HBaseFsck: Checking HBase region split map from HDFS data... 2015-07-01 08:01:03,141 INFO [main] util.HBaseFsck: HDFS regioninfo's seems good. Sidelining old hbase:meta 2015-07-01 08:01:03,227 INFO [main] util.HBaseFsck: Creating new hbase:meta 2015-07-01 08:01:03,248 INFO [main] regionserver.HRegion: creating HRegion hbase:meta HTD == 'hbase:meta', {TABLE_ATTRIBUTES => {IS_META => 'true', coprocessor$1 => '|org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint|536870911|'}, {NAME => 'info', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '10', TTL => '2147483647', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '8192', IN_MEMORY => 'false', BLOCKCACHE => 'false'} RootDir = hdfs://gs1/hbase Table name == hbase:meta 2015-07-01 08:01:03,357 WARN [Thread-6] conf.Configuration: hdfs-default.xml:an attempt to override final parameter: dfs.namenode.name.dir; Ignoring. 2015-07-01 08:01:03,357 WARN [Thread-6] conf.Configuration: hdfs-default.xml:an attempt to override final parameter: dfs.permissions.superusergroup; Ignoring. 2015-07-01 08:01:03,357 WARN [Thread-6] conf.Configuration: hdfs-default.xml:an attempt to override final parameter: dfs.datanode.data.dir; Ignoring. 2015-07-01 08:01:03,513 INFO [main] wal.FSHLog: WAL/HLog configuration: blocksize=128 MB, rollsize=121.60 MB, enabled=true, optionallogflushinternal=1000ms 2015-07-01 08:01:03,652 INFO [main] wal.FSHLog: New WAL /hbase/data/hbase/meta/1588230740/WALs/hlog.1435755663542 2015-07-01 08:01:03,695 INFO [main] impl.MetricsConfig: loaded properties from hadoop-metrics2-hbase.properties 2015-07-01 08:01:08,722 INFO [main] impl.MetricsSinkAdapter: Sink ganglia started 2015-07-01 08:01:08,751 INFO [main] impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2015-07-01 08:01:08,751 INFO [main] impl.MetricsSystemImpl: HBase metrics system started 2015-07-01 08:01:08,937 INFO [StoreOpener-1588230740-1] hfile.CacheConfig: Allocating LruBlockCache with maximum size 355.6 M 2015-07-01 08:01:08,947 INFO [StoreOpener-1588230740-1] compactions.CompactionConfiguration: size [134217728, 9223372036854775807); files [3, 10); ratio 1.200000; off-peak ratio 5.000000; throttle point 2684354560; don't delete expired; major period 86400000, major jitter 0.500000 2015-07-01 08:01:08,954 INFO [StoreOpener-1588230740-1] util.ChecksumType: Checksum using org.apache.hadoop.util.PureJavaCrc32 2015-07-01 08:01:08,954 INFO [StoreOpener-1588230740-1] util.ChecksumType: Checksum can use org.apache.hadoop.util.PureJavaCrc32C 2015-07-01 08:01:08,964 INFO [main] regionserver.HRegion: Onlined 1588230740; next sequenceid=1 2015-07-01 08:01:09,209 INFO [main] regionserver.DefaultStoreFlusher: Flushed, sequenceid=2, memsize=76.9 K, hasBloomFilter=false, into tmp file hdfs://gs1/hbase/data/hbase/meta/1588230740/.tmp/77916d5729c74be9bbd9225da0f92166 2015-07-01 08:01:09,269 INFO [main] regionserver.HStore: Added hdfs://gs1/hbase/data/hbase/meta/1588230740/info/77916d5729c74be9bbd9225da0f92166, entries=262, sequenceid=2, filesize=45.0 K 2015-07-01 08:01:09,270 INFO [main] regionserver.HRegion: Finished memstore flush of ~76.9 K/78704, currentsize=0/0 for region hbase:meta,,1.1588230740 in 251ms, sequenceid=2, compaction requested=false 2015-07-01 08:01:09,273 INFO [StoreCloserThread-hbase:meta,,1.1588230740-1] regionserver.HStore: Closed info 2015-07-01 08:01:09,278 INFO [main] regionserver.HRegion: Closed hbase:meta,,1.1588230740 2015-07-01 08:01:09,278 INFO [main.logSyncer] wal.FSHLog: main.logSyncer exiting 2015-07-01 08:01:09,550 INFO [main] util.HBaseFsck: Success! hbase:meta table rebuilt. 2015-07-01 08:01:09,550 INFO [main] util.HBaseFsck: Old hbase:meta is moved into hdfs://gs1/hbase/.hbck/hbase-1435755661127 Running "./hbase hbck -details" returns: Version: 0.96.1.1-hadoop2 Number of live region servers: 2 server2.corp.gs.com,60020,1435743939507 server3.corp.gs.com,60020,1435744393462 Number of dead region servers: 6 server3.corp.gs.com,60020,1435744393462 server2.corp.gs.com,60020,1435743503791 server1.corp.gs.com,60020,1435719306878 server2.corp.gs.com,60020,1435743939507 server3.corp.gs.com,60020,1435743483810 server1.corp.gs.com,60020,1435743483790 Master: master.corp.gs.com,60000,1435743457234 Number of backup masters: 2 master2.corp.gs.com,60000,1435743535091 master3.corp.gs.com,60000,1435743556550 Average load: 0.0 Number of requests: 0 Number of regions: 0 Number of regions in transition: 1 hbase:meta,,1.1588230740 state=OFFLINE, ts=Wed Jul 01 04:39:43 CDT 2015 (11106s ago), server=null ERROR: RegionServer: server3.corp.gs.com,60020,1435744393462 Unable to fetch region information. java.net.ConnectException: Connection refused ERROR: RegionServer: server2.corp.gs.com,60020,1435743939507 Unable to fetch region information. java.net.ConnectException: Connection refused 2015-07-01 07:44:55,232 WARN [main] util.HBaseFsck: Could not process regionserver server2.corp.gs.com:60020 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493) at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupConnection(RpcClient.java:575) at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:860) at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1535) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1424) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1653) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1711) at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.getOnlineRegion(AdminProtos.java:20583) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getOnlineRegions(ProtobufUtil.java:1575) at org.apache.hadoop.hbase.util.HBaseFsck$WorkItemRegion.call(HBaseFsck.java:3141) at org.apache.hadoop.hbase.util.HBaseFsck$WorkItemRegion.call(HBaseFsck.java:3120) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2015-07-01 07:44:55,234 WARN [main] util.HBaseFsck: Could not process regionserver server3.corp.gs.com:60020 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493) at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupConnection(RpcClient.java:575) at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:860) at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1535) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1424) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1653) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1711) at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.getOnlineRegion(AdminProtos.java:20583) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getOnlineRegions(ProtobufUtil.java:1575) at org.apache.hadoop.hbase.util.HBaseFsck$WorkItemRegion.call(HBaseFsck.java:3141) at org.apache.hadoop.hbase.util.HBaseFsck$WorkItemRegion.call(HBaseFsck.java:3120) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) ERROR: META region or some of its attributes are null. ERROR: Fatal error: unable to get hbase:meta region location. Exiting... 2015-07-01 07:46:07,884 INFO [main] zookeeper.ZooKeeper: Initiating client connection, connectString=master3.corp.gs.com:2181,master2.corp.gs.com:2181,master.corp.gs.com:2181 sessionTimeout=30000 watcher=hbase Fsck, quorum=master3.corp.gs.com:2181,master2.corp.gs.com:2181,master.corp.gs.com:2181, baseZNode=/hbase 2015-07-01 07:46:07,885 INFO [main] zookeeper.RecoverableZooKeeper: Process identifier=hbase Fsck connecting to ZooKeeper ensemble=master3.corp.gs.com:2181,master2.corp.gs.com:2181,master.corp.gs.com:2181 2015-07-01 07:46:12,892 INFO [main-SendThread(master3.corp.gs.com:2181)] zookeeper.ClientCnxn: Opening socket connection to server master3.corp.gs.com/184.154.49.50:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) 2015-07-01 07:46:12,893 INFO [main-SendThread(master3.corp.gs.com:2181)] zookeeper.ClientCnxn: Socket connection established to master3.corp.gs.com/184.154.49.50:2181, initiating session 2015-07-01 07:46:12,919 INFO [main-SendThread(master3.corp.gs.com:2181)] zookeeper.ClientCnxn: Session establishment complete on server master3.corp.gs.com/184.154.49.50:2181, sessionid = 0x24e48f6f6d0000c, negotiated timeout = 30000 Summary: 4 inconsistencies detected. Status: INCONSISTENT Within Master Log (same error for server3 within the log): 2015-07-01 04:46:05,141 INFO [RpcServer.handler=20,port=60000] Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS 2015-07-01 04:46:05,141 WARN [RpcServer.handler=20,port=60000] ipc.RpcServer: (responseTooSlow): {"processingtimems":10008,"call":"RegionServerStartup(org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStartupRequest)","client":"184.154.124.202:35164","starttimems":1435743955133,"queuetimems":0,"class":"HMaster","responsesize":143,"method":"RegionServerStartup"} 2015-07-01 04:46:05,144 WARN [RpcServer.reader=6,port=60000] ipc.RpcServer: RpcServer.listener,port=60000: count of bytes read: 0 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202) at sun.nio.ch.IOUtil.read(IOUtil.java:175) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) at org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2393) at org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1425) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:780) at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:568) at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:543) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2015-07-01 04:46:13,155 INFO [RpcServer.handler=28,port=60000] master.ServerManager: Triggering server recovery; existingServer server2.corp.gs.com,60020,1435743939507 looks stale, new server:server2.corp.gs.com,60020,1435743939507 2015-07-01 04:46:13,156 INFO [RpcServer.handler=28,port=60000] master.ServerManager: Registering server=server2.corp.gs.com,60020,1435743939507 2015-07-01 04:46:34,010 ERROR [RpcServer.handler=29,port=60000] master.HMaster: Region server server2.corp.gs.com,60020,1435743939507 reported a fatal error: ABORTING region server server2.corp.gs.com,60020,1435743939507: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing server2.corp.gs.com,60020,1435743939507 as dead server at org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:339) at org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:254) at org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:1343) at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:5087) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175) at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879) Cause: org.apache.hadoop.hbase.YouAreDeadException: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing server2.corp.gs.com,60020,1435743939507 as dead server at org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:339) at org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:254) at org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:1343) at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:5087) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175) at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:277) at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:985) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:832) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.YouAreDeadException): org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing server2.corp.gs.com,60020,1435743939507 as dead server at org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:339) at org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:254) at org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:1343) at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:5087) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175) at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879) Server 3 Region Server Log: 2015-07-01 04:54:07,960 INFO [regionserver60020] Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS 2015-07-01 04:54:07,965 INFO [regionserver60020] regionserver.HRegionServer: Serving as server3.corp.gs.com,60020,1435744393462, RpcServer on server3.corp.gs.com/184.154.105.138:60020, sessionid=0x24e48f6f6d00005 2015-07-01 04:54:07,965 INFO [SplitLogWorker-server3.corp.gs.com,60020,1435744393462] regionserver.SplitLogWorker: SplitLogWorker server3.corp.gs.com,60020,1435744393462 starting 2015-07-01 04:54:07,990 FATAL [regionserver60020] regionserver.HRegionServer: ABORTING region server server3.corp.gs.com,60020,1435744393462: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing server3.corp.gs.com,60020,1435744393462 as dead server at org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:339) at org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:254) at org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:1343) at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:5087) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175) at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879) org.apache.hadoop.hbase.YouAreDeadException: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing server3.corp.gs.com,60020,1435744393462 as dead server at org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:339) at org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:254) at org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:1343) at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:5087) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175) at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:277) at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:985) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:832) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.YouAreDeadException): org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing server3.corp.gs.com,60020,1435744393462 as dead server at org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:339) at org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:254) at org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:1343) at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:5087) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175) at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1449) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1653) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1711) at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerReport(RegionServerStatusProtos.java:5414) at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:983) ... 2 more 2015-07-01 04:54:07,991 FATAL [regionserver60020] regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [] 2015-07-01 04:54:08,001 INFO [regionserver60020] regionserver.HRegionServer: STOPPED: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing server3.corp.gs.com,60020,1435744393462 as dead server at org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:339) at org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:254) at org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:1343) at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:5087) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175) at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879) Please Help! -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Corrupted-META-tp4072787.html Sent from the HBase User mailing list archive at Nabble.com.