hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jimmy Xiang <jxi...@cloudera.com>
Subject Re: AssignmentManager looping?
Date Thu, 01 Aug 2013 16:40:55 GMT
Something went wrong with split.  It should be easy to fix your cluster.
However, it will be more interesting to find out how it happened. Do you
remember what has happened since it was good previously? Do you have all
the logs?


On Thu, Aug 1, 2013 at 7:08 AM, Jean-Marc Spaggiari <jean-marc@spaggiari.org
> wrote:

> I tried to remove the znodes but got the same result. So I shutted down all
> the RS and restarted HBase, and now I have 0 regions for this table.
> Running HBCK. Seems that it has a lot to do...
>
> 2013/8/1 Kevin O'dell <kevin.odell@cloudera.com>
>
> > Yes you can if HBase is down, first I would copy .META out of HDFS local
> > and then you can search it for split issues. Deleting those znodes should
> > clear this up though.
> > On Aug 1, 2013 8:52 AM, "Jean-Marc Spaggiari" <jean-marc@spaggiari.org>
> > wrote:
> >
> > > I can't check the meta since HBase is down.
> > >
> > > Regarding HDFS, I took few random lines like:
> > > 2013-08-01 08:45:57,260 WARN
> > > org.apache.hadoop.hbase.master.AssignmentManager: Region
> > > 28328fdb7181cbd9cc4d6814775e8895 not found on server
> > > node4,60020,1375319042033; failed processing
> > > 2013-08-01 08:45:57,260 WARN
> > > org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT for
> > region
> > > 28328fdb7181cbd9cc4d6814775e8895 from server node4,60020,1375319042033
> > but
> > > it doesn't exist anymore, probably already processed its split
> > >
> > > And each time, there is nothing like that.
> > > hadoop@node3:~/hadoop-1.0.3$ bin/hadoop fs -lsr / | grep
> > > 28328fdb7181cbd9cc4d6814775e8895
> > >
> > > On ZK side:
> > > [zk: localhost:2181(CONNECTED) 3] ls /hbase/splitlog
> > >
> > > [zk: localhost:2181(CONNECTED) 10] ls /hbase/unassigned
> > > [28328fdb7181cbd9cc4d6814775e8895, a8781a598c46f19723a2405345b58470,
> > > b7ebfeb63b10997736fd12920fde2bb8, d95bb27cc026511c2a8c8ad155e79bf6,
> > > 270a9c371fcbe9cd9a04986e0b77d16b, aff4d1d8bf470458bb19525e8aef0759]
> > >
> > > Can I just delete those zknodes? Worst case hbck will find them back
> from
> > > HDFS if required?
> > >
> > > JM
> > >
> > > 2013/8/1 Kevin O'dell <kevin.odell@cloudera.com>
> > >
> > > > Does it exist in meta or hdfs?
> > > > On Aug 1, 2013 8:24 AM, "Jean-Marc Spaggiari" <
> jean-marc@spaggiari.org
> > >
> > > > wrote:
> > > >
> > > > > My master keep logging that:
> > > > >
> > > > > 2013-07-31 21:52:59,201 WARN
> > > > > org.apache.hadoop.hbase.master.AssignmentManager: Region
> > > > > 270a9c371fcbe9cd9a04986e0b77d16b not found on server
> > > > > node7,60020,1375319044055; failed processing
> > > > > 2013-07-31 21:52:59,201 WARN
> > > > > org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT
> for
> > > > region
> > > > > 270a9c371fcbe9cd9a04986e0b77d16b from server
> > node7,60020,1375319044055
> > > > but
> > > > > it doesn't exist anymore, probably already processed its split
> > > > > 2013-07-31 21:52:59,339 WARN
> > > > > org.apache.hadoop.hbase.master.AssignmentManager: Region
> > > > > 270a9c371fcbe9cd9a04986e0b77d16b not found on server
> > > > > node7,60020,1375319044055; failed processing
> > > > > 2013-07-31 21:52:59,339 WARN
> > > > > org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT
> for
> > > > region
> > > > > 270a9c371fcbe9cd9a04986e0b77d16b from server
> > node7,60020,1375319044055
> > > > but
> > > > > it doesn't exist anymore, probably already processed its split
> > > > > 2013-07-31 21:52:59,461 WARN
> > > > > org.apache.hadoop.hbase.master.AssignmentManager: Region
> > > > > 270a9c371fcbe9cd9a04986e0b77d16b not found on server
> > > > > node7,60020,1375319044055; failed processing
> > > > > 2013-07-31 21:52:59,461 WARN
> > > > > org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT
> for
> > > > region
> > > > > 270a9c371fcbe9cd9a04986e0b77d16b from server
> > node7,60020,1375319044055
> > > > but
> > > > > it doesn't exist anymore, probably already processed its split
> > > > > 2013-07-31 21:52:59,636 WARN
> > > > > org.apache.hadoop.hbase.master.AssignmentManager: Region
> > > > > 270a9c371fcbe9cd9a04986e0b77d16b not found on server
> > > > > node7,60020,1375319044055; failed processing
> > > > > 2013-07-31 21:52:59,636 WARN
> > > > > org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT
> for
> > > > region
> > > > > 270a9c371fcbe9cd9a04986e0b77d16b from server
> > node7,60020,1375319044055
> > > > but
> > > > > it doesn't exist anymore, probably already processed its split
> > > > > 2013-07-31 21:53:00,074 WARN
> > > > > org.apache.hadoop.hbase.master.AssignmentManager: Region
> > > > > 270a9c371fcbe9cd9a04986e0b77d16b not found on server
> > > > > node7,60020,1375319044055; failed processing
> > > > > 2013-07-31 21:53:00,074 WARN
> > > > > org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT
> for
> > > > region
> > > > > 270a9c371fcbe9cd9a04986e0b77d16b from server
> > node7,60020,1375319044055
> > > > but
> > > > > it doesn't exist anymore, probably already processed its split
> > > > > 2013-07-31 21:53:00,261 WARN
> > > > > org.apache.hadoop.hbase.master.AssignmentManager: Region
> > > > > 270a9c371fcbe9cd9a04986e0b77d16b not found on server
> > > > > node7,60020,1375319044055; failed processing
> > > > > 2013-07-31 21:53:00,261 WARN
> > > > > org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT
> for
> > > > region
> > > > > 270a9c371fcbe9cd9a04986e0b77d16b from server
> > node7,60020,1375319044055
> > > > but
> > > > > it doesn't exist anymore, probably already processed its split
> > > > > 2013-07-31 21:53:00,417 WARN
> > > > > org.apache.hadoop.hbase.master.AssignmentManager: Region
> > > > > 270a9c371fcbe9cd9a04986e0b77d16b not found on server
> > > > > node7,60020,1375319044055; failed processing
> > > > > 2013-07-31 21:53:00,417 WARN
> > > > > org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT
> for
> > > > region
> > > > > 270a9c371fcbe9cd9a04986e0b77d16b from server
> > node7,60020,1375319044055
> > > > but
> > > > > it doesn't exist anymore, probably already processed its split
> > > > >
> > > > > hbase@node3:~/hbase-0.94.3$ cat
> logs/hbase-hbase-master-node3.log* |
> > > > grep
> > > > > "Region 270a9c371fcbe9cd9a04986e0b77d16b not found " | wc
> > > > >    5042   65546  927728
> > > > >
> > > > >
> > > > > Then crashed.
> > > > > 2013-07-31 22:22:46,072 FATAL
> org.apache.hadoop.hbase.master.HMaster:
> > > > > Master server abort: loaded coprocessors are: []
> > > > > 2013-07-31 22:22:46,073 FATAL
> org.apache.hadoop.hbase.master.HMaster:
> > > > > Unexpected state : work_proposed,\x02\xE8\x92'\x00\x00\x00\x00
> > > > >
> > > > >
> > > >
> > >
> >
> http://video.inportnews.ca/search/all/source/sun-news-network/harry-potter-in-translation/68463493001/page/1526,1375307272709.d95bb27cc026511c2a8c8ad155e79bf6
> > > > .
> > > > > state=OPENING, ts=1375323766008, server=node7,60020,1375319044055
> ..
> > > > > Cannot
> > > > > transit it to OFFLINE.
> > > > > java.lang.IllegalStateException: Unexpected state :
> > > > > work_proposed,\x02\xE8\x92'\x00\x00\x00\x00
> > > > >
> > > > >
> > > >
> > >
> >
> http://video.inportnews.ca/search/all/source/sun-news-network/harry-potter-in-translation/68463493001/page/1526,1375307272709.d95bb27cc026511c2a8c8ad155e79bf6
> > > > .
> > > > > state=OPENING, ts=1375323766008, server=node7,60020,1375319044055
> ..
> > > > > Cannot
> > > > > transit it to OFFLINE.
> > > > >     at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879)
> > > > >     at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688)
> > > > >     at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424)
> > > > >     at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399)
> > > > >     at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394)
> > > > >     at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105)
> > > > >     at
> > > > >
> > >
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
> > > > >     at
> > > > >
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > > > >     at
> > > > >
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > > > >     at java.lang.Thread.run(Thread.java:722)
> > > > > 2013-07-31 22:22:46,075 INFO
> org.apache.hadoop.hbase.master.HMaster:
> > > > > Aborting
> > > > > 2013-07-31 22:22:46,075 INFO org.apache.hadoop.ipc.HBaseServer:
> > > Stopping
> > > > > server on 60000
> > > > > 2013-07-31 22:22:46,075 INFO
> > org.apache.hadoop.hbase.master.HMaster$2:
> > > > > node3,60000,1375322220614-BalancerChore exiting
> > > > > 2013-07-31 22:22:46,075 INFO
> > > > org.apache.hadoop.hbase.master.CatalogJanitor:
> > > > > node3,60000,1375322220614-CatalogJanitor exiting
> > > > > 2013-07-31 22:22:46,076 INFO org.apache.hadoop.ipc.HBaseServer:
> > > Stopping
> > > > > IPC Server listener on 60000
> > > > > 2013-07-31 22:22:46,077 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> > > > Server
> > > > > handler 9 on 60000: exiting
> > > > > 2013-07-31 22:22:46,077 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> > > > Server
> > > > > handler 2 on 60000: exiting
> > > > > 2013-07-31 22:22:46,077 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> > > > Server
> > > > > handler 4 on 60000: exiting
> > > > > 2013-07-31 22:22:46,077 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> > > > Server
> > > > > handler 8 on 60000: exiting
> > > > > 2013-07-31 22:22:46,076 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> > > > Server
> > > > > handler 6 on 60000: exiting
> > > > > 2013-07-31 22:22:46,076 INFO org.apache.hadoop.ipc.HBaseServer:
> REPL
> > > IPC
> > > > > Server handler 2 on 60000: exiting
> > > > > 2013-07-31 22:22:46,076 INFO org.apache.hadoop.ipc.HBaseServer:
> REPL
> > > IPC
> > > > > Server handler 1 on 60000: exiting
> > > > > 2013-07-31 22:22:46,076 INFO org.apache.hadoop.ipc.HBaseServer:
> REPL
> > > IPC
> > > > > Server handler 0 on 60000: exiting
> > > > > 2013-07-31 22:22:46,077 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> > > > Server
> > > > > handler 3 on 60000: exiting
> > > > > 2013-07-31 22:22:46,076 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> > > > Server
> > > > > handler 0 on 60000: exiting
> > > > > 2013-07-31 22:22:46,077 INFO
> > > > > org.apache.hadoop.hbase.master.cleaner.HFileCleaner:
> > > > > master-node3,60000,1375322220614.archivedHFileCleaner exiting
> > > > > 2013-07-31 22:22:46,077 INFO
> > > > > org.apache.hadoop.hbase.master.cleaner.LogCleaner:
> > > > > master-node3,60000,1375322220614.oldLogCleaner exiting
> > > > > 2013-07-31 22:22:46,077 INFO
> org.apache.hadoop.hbase.master.HMaster:
> > > > > Stopping infoServer
> > > > > 2013-07-31 22:22:46,077 INFO org.apache.hadoop.ipc.HBaseServer:
> > > Stopping
> > > > > IPC Server Responder
> > > > > 2013-07-31 22:22:46,077 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> > > > Server
> > > > > handler 5 on 60000: exiting
> > > > > 2013-07-31 22:22:46,077 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> > > > Server
> > > > > handler 7 on 60000: exiting
> > > > > 2013-07-31 22:22:46,077 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> > > > Server
> > > > > handler 1 on 60000: exiting
> > > > > 2013-07-31 22:22:46,077 INFO org.apache.hadoop.ipc.HBaseServer:
> > > Stopping
> > > > > IPC Server Responder
> > > > > 2013-07-31 22:22:46,078 INFO org.mortbay.log: Stopped
> > > > > SelectChannelConnector@0.0.0.0:60010
> > > > > 2013-07-31 22:22:46,127 WARN
> > > > > org.apache.hadoop.hbase.master.AssignmentManager: Region
> > > > > 270a9c371fcbe9cd9a04986e0b77d16b not found on server
> > > > > node7,60020,1375319044055; failed processing
> > > > > 2013-07-31 22:22:46,127 WARN
> > > > > org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT
> for
> > > > region
> > > > > 270a9c371fcbe9cd9a04986e0b77d16b from server
> > node7,60020,1375319044055
> > > > but
> > > > > it doesn't exist anymore, probably already processed its split
> > > > > 2013-07-31 22:22:46,181 WARN
> > > > > org.apache.hadoop.hbase.master.AssignmentManager: Region
> > > > > aff4d1d8bf470458bb19525e8aef0759 not found on server
> > > > > node2,60020,1375319046072; failed processing
> > > > > 2013-07-31 22:22:46,181 WARN
> > > > > org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT
> for
> > > > region
> > > > > aff4d1d8bf470458bb19525e8aef0759 from server
> > node2,60020,1375319046072
> > > > but
> > > > > it doesn't exist anymore, probably already processed its split
> > > > > 2013-07-31 22:22:46,193 ERROR
> > > > > org.apache.hadoop.hbase.executor.ExecutorService: Cannot submit
> > > > > [ClosedRegionHandler-node3,60000,1375322220614-179] because the
> > > executor
> > > > is
> > > > > missing. Is this process shutting down?
> > > > > 2013-07-31 22:22:46,250 WARN
> > > > > org.apache.hadoop.hbase.master.AssignmentManager: Region
> > > > > 28328fdb7181cbd9cc4d6814775e8895 not found on server
> > > > > node4,60020,1375319042033; failed processing
> > > > > 2013-07-31 22:22:46,250 WARN
> > > > > org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT
> for
> > > > region
> > > > > 28328fdb7181cbd9cc4d6814775e8895 from server
> > node4,60020,1375319042033
> > > > but
> > > > > it doesn't exist anymore, probably already processed its split
> > > > > 2013-07-31 22:22:46,262 INFO
> > > > > org.apache.hadoop.hbase.master.SplitLogManager$TimeoutMonitor:
> > > > > node3,60000,1375322220614.splitLogManagerTimeoutMonitor exiting
> > > > > 2013-07-31 22:22:46,293 WARN
> > > > > org.apache.hadoop.hbase.master.AssignmentManager: Region
> > > > > 270a9c371fcbe9cd9a04986e0b77d16b not found on server
> > > > > node7,60020,1375319044055; failed processing
> > > > > 2013-07-31 22:22:46,293 WARN
> > > > > org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT
> for
> > > > region
> > > > > 270a9c371fcbe9cd9a04986e0b77d16b from server
> > node7,60020,1375319044055
> > > > but
> > > > > it doesn't exist anymore, probably already processed its split
> > > > > 2013-07-31 22:22:46,294 INFO
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> > > > > Closed zookeeper sessionid=0x240024f5666144b
> > > > > 2013-07-31 22:22:46,361 WARN
> > > > > org.apache.hadoop.hbase.master.AssignmentManager: Region
> > > > > aff4d1d8bf470458bb19525e8aef0759 not found on server
> > > > > node2,60020,1375319046072; failed processing
> > > > > 2013-07-31 22:22:46,362 WARN
> > > > > org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT
> for
> > > > region
> > > > > aff4d1d8bf470458bb19525e8aef0759 from server
> > node2,60020,1375319046072
> > > > but
> > > > > it doesn't exist anymore, probably already processed its split
> > > > > 2013-07-31 22:22:46,388 INFO
> > > > > org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor:
> > > > > node3,60000,1375322220614.timeoutMonitor exiting
> > > > > 2013-07-31 22:22:46,388 INFO
> > > > > org.apache.hadoop.hbase.master.AssignmentManager$TimerUpdater:
> > > > > node3,60000,1375322220614.timerUpdater exiting
> > > > > 2013-07-31 22:22:46,402 INFO
> org.apache.hadoop.hbase.master.HMaster:
> > > > > HMaster main thread exiting
> > > > > 2013-07-31 22:22:46,402 ERROR
> > > > > org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start
> > > master
> > > > > java.lang.RuntimeException: HMaster Aborted
> > > > >     at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:160)
> > > > >     at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:104)
> > > > >     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> > > > >     at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
> > > > >     at
> org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2100)
> > > > >
> > > > > Seems that HBCK can't do anything. I will start to look at the
> files
> > > into
> > > > > HDFS, but suggestions are welcome.
> > > > >
> > > > > JM
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message