hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: Never ending transtionning regions.
Date Sat, 23 Feb 2013 23:14:17 GMT
Hi Kevin,

Thanks for taking the time to reply.

Here is a bigger extract of the logs. I don't see another path in the logs.

http://pastebin.com/uMxGyjKm

I can send you the entire log if you want (42Mo)

What I did is I merged many regions together, then altered the table to set
the max_filesize and started a major_compaction to get the table splitted.

To fix the issue I had to drop one working table, and ran -repair multiple
times. Now it's fixed, but I still have the logs.

I'm redoing all the steps I did. Many I will face the issue again. If I'm
able to reproduce, we might be able to figure where the issue is...

JM

2013/2/23 Kevin O'dell <kevin.odell@cloudera.com>

> JM,
>
>   How are you doing today?  Right before the file does not exist should be
> another path.  Can you let me know if in that path there are a pointers
> from a split to 2ebfef593a3d715b59b85670909182c9?  The directory may
> already exist.  I have seen this a couple times now and am trying to ferret
> out a root cause to open a JIRA with.  I suspect we have a split code bug
> in .92+
>
> On Sat, Feb 23, 2013 at 4:10 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
> > Hi,
> >
> > I have 2 regions transitionning from servers to servers for 15 minutes
> now.
> >
> > I have nothing in the master logs about those 2 regions but on the region
> > server logs I have some files notfound2013-02-23 16:02:07,347 ERROR
> > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed
> open
> > of region=entry,theykey,1361651769136.6dd77bc9ff91e0e6d413f74e670ab435.,
> > starting to roll back the global memstore size.
> > java.io.IOException: java.io.IOException: java.io.FileNotFoundException:
> > File does not exist:
> >
> >
> /hbase/entry/2ebfef593a3d715b59b85670909182c9/a/62b0aae45d59408dbcfc513954efabc7
> >     at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:597)
> >     at
> > org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:510)
> >     at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4177)
> >     at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4125)
> >     at
> >
> >
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:328)
> >     at
> >
> >
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:100)
> >     at
> > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:169)
> >     at
> >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >     at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> >     at java.lang.Thread.run(Thread.java:722)
> > Caused by: java.io.IOException: java.io.FileNotFoundException: File does
> > not exist:
> >
> >
> /hbase/entry/2ebfef593a3d715b59b85670909182c9/a/62b0aae45d59408dbcfc513954efabc7
> >     at
> > org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:433)
> >     at org.apache.hadoop.hbase.regionserver.Store.<init>(Store.java:240)
> >     at
> >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:3141)
> >     at
> > org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:572)
> >     at
> > org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:570)
> >     at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >     at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >     at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >     at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >     at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >     ... 3 more
> > Caused by: java.io.FileNotFoundException: File does not exist:
> >
> >
> /hbase/entry/2ebfef593a3d715b59b85670909182c9/a/62b0aae45d59408dbcfc513954efabc7
> >     at
> >
> >
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1843)
> >     at
> >
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1834)
> >     at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578)
> >     at
> >
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154)
> >     at
> > org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:108)
> >     at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
> >     at
> >
> >
> org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:573)
> >     at
> >
> >
> org.apache.hadoop.hbase.regionserver.StoreFile$Reader.<init>(StoreFile.java:1261)
> >     at
> >
> >
> org.apache.hadoop.hbase.io.HalfStoreFileReader.<init>(HalfStoreFileReader.java:70)
> >     at
> > org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:508)
> >     at
> >
> >
> org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:603)
> >     at org.apache.hadoop.hbase.regionserver.Store$1.call(Store.java:409)
> >     at org.apache.hadoop.hbase.regionserver.Store$1.call(Store.java:404)
> >     ... 8 more
> > 2013-02-23 16:02:07,370 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > regionserver:60020-0x13d07ec012501fc Attempt to transition the unassigned
> > node for 6dd77bc9ff91e0e6d413f74e670ab435 from RS_ZK_REGION_OPENING to
> > RS_ZK_REGION_FAILED_OPEN failed, the node existed but was version 6586
> not
> > the expected version 6585
> >
> >
> > If I try hbck -fix, this is bringing the master down:
> > 2013-02-23 16:03:01,419 INFO org.apache.hadoop.hbase.master.HMaster:
> > BalanceSwitch=false
> > 2013-02-23 16:03:03,067 FATAL org.apache.hadoop.hbase.master.HMaster:
> > Master server abort: loaded coprocessors are: []
> > 2013-02-23 16:03:03,068 FATAL org.apache.hadoop.hbase.master.HMaster:
> > Unexpected state :
> > entry,thekey,1361651769136.6dd77bc9ff91e0e6d413f74e670ab435.
> > state=PENDING_OPEN, ts=1361653383067, server=node2,60020,1361653023303 ..
> > Cannot transit it to OFFLINE.
> > java.lang.IllegalStateException: Unexpected state :
> > entry,thekey,1361651769136.6dd77bc9ff91e0e6d413f74e670ab435.
> > state=PENDING_OPEN, ts=1361653383067, server=node2,60020,1361653023303 ..
> > Cannot transit it to OFFLINE.
> >     at
> >
> >
> org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1813)
> >     at
> >
> >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1658)
> >     at
> >
> >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1423)
> >     at
> >
> >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1398)
> >     at
> >
> >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1393)
> >     at
> > org.apache.hadoop.hbase.master.HMaster.assignRegion(HMaster.java:1740)
> >     at org.apache.hadoop.hbase.master.HMaster.assign(HMaster.java:1731)
> >     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >     at
> >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >     at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >     at java.lang.reflect.Method.invoke(Method.java:601)
> >     at
> >
> >
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
> >     at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
> > 2013-02-23 16:03:03,069 INFO org.apache.hadoop.hbase.master.HMaster:
> > Aborting
> > 2013-02-23 16:03:03,069 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
> > server on 60000
> > 2013-02-23 16:03:03,069 INFO
> org.apache.hadoop.hbase.master.CatalogJanitor:
> > node3,60000,1361653064588-CatalogJanitor exiting
> > 2013-02-23 16:03:03,069 INFO org.apache.hadoop.hbase.master.HMaster$2:
> > node3,60000,1361653064588-BalancerChore exiting
> > 2013-02-23 16:03:03,070 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> > handler 5 on 60000: exiting
> > 2013-02-23 16:03:03,070 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> > handler 4 on 60000: exiting
> > 2013-02-23 16:03:03,071 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> > handler 8 on 60000: exiting
> > 2013-02-23 16:03:03,070 INFO
> > org.apache.hadoop.hbase.master.cleaner.HFileCleaner:
> > master-node3,60000,1361653064588.archivedHFileCleaner exiting
> > 2013-02-23 16:03:03,070 INFO
> > org.apache.hadoop.hbase.master.cleaner.LogCleaner:
> > master-node3,60000,1361653064588.oldLogCleaner exiting
> > 2013-02-23 16:03:03,070 INFO org.apache.hadoop.hbase.master.HMaster:
> > Stopping infoServer
> > 2013-02-23 16:03:03,070 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
> > IPC Server Responder
> > 2013-02-23 16:03:03,070 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC
> > Server handler 1 on 60000: exiting
> > 2013-02-23 16:03:03,070 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC
> > Server handler 2 on 60000: exiting
> > 2013-02-23 16:03:03,071 WARN org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> > Responder, call isMasterRunning(), rpc version=1, client version=29,
> > methodsFingerPrint=891823089 from 192.168.23.7:43381: output error
> > 2013-02-23 16:03:03,071 WARN org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> > handler 3 on 60000 caught a ClosedChannelException, this means that the
> > server was processing a request but the client went away. The error
> message
> > was: null
> > 2013-02-23 16:03:03,071 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> > handler 3 on 60000: exiting
> > 2013-02-23 16:03:03,070 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> > handler 1 on 60000: exiting
> > 2013-02-23 16:03:03,071 INFO org.mortbay.log: Stopped
> > SelectChannelConnector@0.0.0.0:60010
> > 2013-02-23 16:03:03,071 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
> > IPC Server Responder
> > 2013-02-23 16:03:03,071 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> > handler 6 on 60000: exiting
> > 2013-02-23 16:03:03,071 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> > handler 7 on 60000: exiting
> > 2013-02-23 16:03:03,071 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> > handler 0 on 60000: exiting
> > 2013-02-23 16:03:03,071 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> > handler 2 on 60000: exiting
> > 2013-02-23 16:03:03,071 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
> > IPC Server listener on 60000
> > 2013-02-23 16:03:03,071 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> > handler 9 on 60000: exiting
> > 2013-02-23 16:03:03,070 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC
> > Server handler 0 on 60000: exiting
> > 2013-02-23 16:03:03,287 INFO
> >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> > Closed zookeeper sessionid=0x33d07f1130301fe
> > 2013-02-23 16:03:03,453 INFO
> > org.apache.hadoop.hbase.master.AssignmentManager$TimerUpdater:
> > node3,60000,1361653064588.timerUpdater exiting
> > 2013-02-23 16:03:03,453 INFO
> > org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor:
> > node3,60000,1361653064588.timeoutMonitor exiting
> > 2013-02-23 16:03:03,453 INFO
> > org.apache.hadoop.hbase.master.SplitLogManager$TimeoutMonitor:
> > node3,60000,1361653064588.splitLogManagerTimeoutMonitor exiting
> > 2013-02-23 16:03:03,468 INFO org.apache.hadoop.hbase.master.HMaster:
> > HMaster main thread exiting
> > 2013-02-23 16:03:03,469 ERROR
> > org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start master
> > java.lang.RuntimeException: HMaster Aborted
> >     at
> >
> >
> org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:160)
> >     at
> >
> >
> org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:104)
> >     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >     at
> >
> >
> org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
> >     at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1927)
> >
> > I'm running with 0.94.5 +
> > HBASE-7824<https://issues.apache.org/jira/browse/HBASE-7824>+
> > HBASE-7865 <https://issues.apache.org/jira/browse/HBASE-7865>. I don't
> > think the 2 patchs are related to this issue.
> >
> > Hadoop fsck reports "The filesystem under path '/' is HEALTHY" without
> any
> > issue.
> >
> >
> /hbase/entry/2ebfef593a3d715b59b85670909182c9/a/62b0aae45d59408dbcfc513954efabc7
> > does exist in the FS.
> >
> > What I don't understand is why is the master going down? And how can I
> fix
> > that?
> >
> > I will try to create the missing directory and see the results...
> >
> > Thanks,
> >
> > JM
> >
>
>
>
> --
> Kevin O'Dell
> Customer Operations Engineer, Cloudera
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message