hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Stuck closing region / region is flushing
Date Sun, 15 Mar 2015 01:00:04 GMT
We're hitting this type of HDFS issue in production too. Your best option
is to kill the regionserver process forcefully, start a replacement, and
let the region(s) affected recover. All edits should be persisted to the
WAL regardless of what Ted said about flushing.

We are working on the problem, please see HBASE-13238


On Saturday, March 14, 2015, Kristoffer Sjögren <stoffe@gmail.com
<javascript:_e(%7B%7D,'cvml','stoffe@gmail.com');>> wrote:

> I think I found the thread that is stuck. Is restarting the server harmless
> in this state?
>
> "RS_CLOSE_REGION-hdfs-ix03.se-ix.delta.prod,60020,1424687995350-1" prio=10
> tid=0x00007f75a0008000 nid=0x23ee in Object.wait() [0x00007f757d30b000]
>    java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:503)
> at
>
> org.apache.hadoop.hdfs.DFSOutputStream.waitAndQueueCurrentPacket(DFSOutputStream.java:1411)
> - locked <0x00000007544573e8> (a java.util.LinkedList)
> at
>
> org.apache.hadoop.hdfs.DFSOutputStream.writeChunk(DFSOutputStream.java:1479)
> - locked <0x0000000756780218> (a org.apache.hadoop.hdfs.DFSOutputStream)
> at
>
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:173)
> at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:116)
> at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:102)
> - locked <0x0000000756780218> (a org.apache.hadoop.hdfs.DFSOutputStream)
> at
>
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
> at java.io.DataOutputStream.write(DataOutputStream.java:107)
> - locked <0x00000007543ef268> (a
> org.apache.hadoop.hdfs.client.HdfsDataOutputStream)
> at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
> at
>
> org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.writeHeaderAndData(HFileBlock.java:1061)
> at
>
> org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.writeHeaderAndData(HFileBlock.java:1047)
> at
>
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIntermediateBlock(HFileBlockIndex.java:952)
> at
>
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIntermediateLevel(HFileBlockIndex.java:935)
> at
>
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIndexBlocks(HFileBlockIndex.java:844)
> at
>
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:403)
> at
>
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:1272)
> at
>
> org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:835)
> - locked <0x000000075d8b2110> (a java.lang.Object)
> at org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:746)
> at
>
> org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:2348)
> at
>
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1580)
> at
>
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1479)
> at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:992)
> at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:956)
> - locked <0x000000075d97b628> (a java.lang.Object)
> at
>
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:119)
> at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
>
> On Sat, Mar 14, 2015 at 9:43 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > bq. flush the region manually using shell?
> >
> > I doubt that would work - you can give it a try.
> > Please take jstack of region server in case you need to restart the
> server.
> >
> > BTW HBASE-10499 didn't go into 0.94 (maybe it should have). Please
> consider
> > upgrading.
> >
> > Cheers
> >
> > On Sat, Mar 14, 2015 at 1:30 PM, Kristoffer Sjögren <stoffe@gmail.com>
> > wrote:
> >
> > > Hi Ted
> > >
> > > Sorry I forgot to mention, hbase-0.94.6 cdh 4.4.
> > >
> > > Yeah, it was a pretty write intensive scenario that I think triggered
> it
> > > (importing a lot of datapoints into opentsdb).
> > >
> > > Do I flush the region manually using shell?
> > >
> > > Cheers,
> > > -Kristoffer
> > >
> > > On Sat, Mar 14, 2015 at 9:22 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> > >
> > > > Which release of HBase are you using ?
> > > >
> > > > I wonder if your cluster was hit with HBASE-10499.
> > > >
> > > > Cheers
> > > >
> > > > On Sat, Mar 14, 2015 at 1:13 PM, Kristoffer Sjögren <
> stoffe@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > It seems one of our region servers has been stuck closing a region
> > for
> > > > > almost 22 hours. Puts or gets eventually fail with an exception
> [1].
> > > > >
> > > > > Is there any safe way to release the region like restarting the
> > region
> > > > > server?
> > > > >
> > > > > Cheers,
> > > > > -Kristoffer
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > > > 2015-03-14 21:02:24,316 INFO
> > > > org.apache.hadoop.hbase.regionserver.HRegion:
> > > > > Failed to unblock updates for region
> > > > > tsdb,\x00\x00\x9ETU\xAC@
> > > > >
> > > >
> > >
> >
> \x00\x00\x01\x00\x00\xAD\x00\x00\x05\x00\x00\xA7,1426282871862.4512f92b3d81e9142542d3b458223b63.
> > > > > 'IPC Server handler 9 on 60020' in 60000ms. The region is still
> busy.
> > > > > 2015-03-14 21:02:24,316 ERROR
> > > > > org.apache.hadoop.hbase.regionserver.HRegionServer:
> > > > > org.apache.hadoop.hbase.RegionTooBusyException: region is flushing
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:2731)
> > > > > at
> > org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:2002)
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:2114)
> > > > > at sun.reflect.GeneratedMethodAccessor109.invoke(Unknown Source)
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > > > at java.lang.reflect.Method.invoke(Method.java:606)
> > > > > at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1428)
> > > > >
> > > >
> > >
> >
>


-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message