Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 55EE61036D for ; Sun, 15 Mar 2015 10:04:58 +0000 (UTC) Received: (qmail 33982 invoked by uid 500); 15 Mar 2015 10:04:56 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 33912 invoked by uid 500); 15 Mar 2015 10:04:56 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 33900 invoked by uid 99); 15 Mar 2015 10:04:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 15 Mar 2015 10:04:55 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of stoffe@gmail.com designates 209.85.223.182 as permitted sender) Received: from [209.85.223.182] (HELO mail-ie0-f182.google.com) (209.85.223.182) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 15 Mar 2015 10:04:50 +0000 Received: by iegc3 with SMTP id c3so153354441ieg.3 for ; Sun, 15 Mar 2015 03:02:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=u7svjMl0zwJpSJ+E4XfXgGhSJf+x/CR8oJJ3L8BpJIQ=; b=BqBCJQW9kWJ7nMssQLvqbe+aAD7EKvXBDbCDUIWCMvKbeCcLxnAmZXTzW2uUDP1e9s bp/6IF4HNT7zJcGSNu/tabn47r4b/wTNb4/Gm23RjIEF51tP3dmQ3dYhbUrXmlWoffKe EfH69PwtXt3+TVsKzR92S2buejmN7Pnz36GAyav9ihWAnHVP7eaRZFAe4IHhUY9LRE7T kY3f4+IHEaOZxIZZvhB8XkDI/uT/Elj/D+Kk+t1n23RyZ58AXk8paZZqGRRu1sV3x1Jx Hk9BK8ocEVsRT3sATlLydCbM4rOj05/LnePSfuiXWVFwtuiB9z3Dr5WK8sY90WM0CPpx sPIA== MIME-Version: 1.0 X-Received: by 10.50.43.198 with SMTP id y6mr125615004igl.16.1426413779865; Sun, 15 Mar 2015 03:02:59 -0700 (PDT) Received: by 10.107.152.131 with HTTP; Sun, 15 Mar 2015 03:02:59 -0700 (PDT) In-Reply-To: References: Date: Sun, 15 Mar 2015 11:02:59 +0100 Message-ID: Subject: Re: Stuck closing region / region is flushing From: =?UTF-8?Q?Kristoffer_Sj=C3=B6gren?= To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=089e01537a96a6da4d051150d6bf X-Virus-Checked: Checked by ClamAV on apache.org --089e01537a96a6da4d051150d6bf Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Sounds great! Thanks for the info. On Sun, Mar 15, 2015 at 2:00 AM, Andrew Purtell wrote= : > We're hitting this type of HDFS issue in production too. Your best option > is to kill the regionserver process forcefully, start a replacement, and > let the region(s) affected recover. All edits should be persisted to the > WAL regardless of what Ted said about flushing. > > We are working on the problem, please see HBASE-13238 > > > On Saturday, March 14, 2015, Kristoffer Sj=C3=B6gren > wrote: > > > I think I found the thread that is stuck. Is restarting the server > harmless > > in this state? > > > > "RS_CLOSE_REGION-hdfs-ix03.se-ix.delta.prod,60020,1424687995350-1" > prio=3D10 > > tid=3D0x00007f75a0008000 nid=3D0x23ee in Object.wait() [0x00007f757d30b= 000] > > java.lang.Thread.State: WAITING (on object monitor) > > at java.lang.Object.wait(Native Method) > > at java.lang.Object.wait(Object.java:503) > > at > > > > > org.apache.hadoop.hdfs.DFSOutputStream.waitAndQueueCurrentPacket(DFSOutpu= tStream.java:1411) > > - locked <0x00000007544573e8> (a java.util.LinkedList) > > at > > > > > org.apache.hadoop.hdfs.DFSOutputStream.writeChunk(DFSOutputStream.java:14= 79) > > - locked <0x0000000756780218> (a org.apache.hadoop.hdfs.DFSOutputStream= ) > > at > > > > > org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.jav= a:173) > > at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:116) > > at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:102) > > - locked <0x0000000756780218> (a org.apache.hadoop.hdfs.DFSOutputStream= ) > > at > > > > > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputS= tream.java:54) > > at java.io.DataOutputStream.write(DataOutputStream.java:107) > > - locked <0x00000007543ef268> (a > > org.apache.hadoop.hdfs.client.HdfsDataOutputStream) > > at java.io.FilterOutputStream.write(FilterOutputStream.java:97) > > at > > > > > org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.writeHeaderAndData(HFi= leBlock.java:1061) > > at > > > > > org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.writeHeaderAndData(HFi= leBlock.java:1047) > > at > > > > > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIn= termediateBlock(HFileBlockIndex.java:952) > > at > > > > > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIn= termediateLevel(HFileBlockIndex.java:935) > > at > > > > > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIn= dexBlocks(HFileBlockIndex.java:844) > > at > > > > > org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:4= 03) > > at > > > > > org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.jav= a:1272) > > at > > > > > org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:= 835) > > - locked <0x000000075d8b2110> (a java.lang.Object) > > at org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:746= ) > > at > > > > > org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(St= ore.java:2348) > > at > > > > > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.j= ava:1580) > > at > > > > > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.j= ava:1479) > > at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:99= 2) > > at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:956) > > - locked <0x000000075d97b628> (a java.lang.Object) > > at > > > > > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(C= loseRegionHandler.java:119) > > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) > > at > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java= :1145) > > at > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav= a:615) > > at java.lang.Thread.run(Thread.java:745) > > > > > > On Sat, Mar 14, 2015 at 9:43 PM, Ted Yu wrote: > > > > > bq. flush the region manually using shell? > > > > > > I doubt that would work - you can give it a try. > > > Please take jstack of region server in case you need to restart the > > server. > > > > > > BTW HBASE-10499 didn't go into 0.94 (maybe it should have). Please > > consider > > > upgrading. > > > > > > Cheers > > > > > > On Sat, Mar 14, 2015 at 1:30 PM, Kristoffer Sj=C3=B6gren > > > wrote: > > > > > > > Hi Ted > > > > > > > > Sorry I forgot to mention, hbase-0.94.6 cdh 4.4. > > > > > > > > Yeah, it was a pretty write intensive scenario that I think trigger= ed > > it > > > > (importing a lot of datapoints into opentsdb). > > > > > > > > Do I flush the region manually using shell? > > > > > > > > Cheers, > > > > -Kristoffer > > > > > > > > On Sat, Mar 14, 2015 at 9:22 PM, Ted Yu wrote= : > > > > > > > > > Which release of HBase are you using ? > > > > > > > > > > I wonder if your cluster was hit with HBASE-10499. > > > > > > > > > > Cheers > > > > > > > > > > On Sat, Mar 14, 2015 at 1:13 PM, Kristoffer Sj=C3=B6gren < > > stoffe@gmail.com> > > > > > wrote: > > > > > > > > > > > Hi > > > > > > > > > > > > It seems one of our region servers has been stuck closing a > region > > > for > > > > > > almost 22 hours. Puts or gets eventually fail with an exception > > [1]. > > > > > > > > > > > > Is there any safe way to release the region like restarting the > > > region > > > > > > server? > > > > > > > > > > > > Cheers, > > > > > > -Kristoffer > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > 2015-03-14 21:02:24,316 INFO > > > > > org.apache.hadoop.hbase.regionserver.HRegion: > > > > > > Failed to unblock updates for region > > > > > > tsdb,\x00\x00\x9ETU\xAC@ > > > > > > > > > > > > > > > > > > > > > \x00\x00\x01\x00\x00\xAD\x00\x00\x05\x00\x00\xA7,1426282871862.4512f92b3d= 81e9142542d3b458223b63. > > > > > > 'IPC Server handler 9 on 60020' in 60000ms. The region is still > > busy. > > > > > > 2015-03-14 21:02:24,316 ERROR > > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: > > > > > > org.apache.hadoop.hbase.RegionTooBusyException: region is > flushing > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:= 2731) > > > > > > at > > > org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:2002) > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java= :2114) > > > > > > at sun.reflect.GeneratedMethodAccessor109.invoke(Unknown Source= ) > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI= mpl.java:43) > > > > > > at java.lang.reflect.Method.invoke(Method.java:606) > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngi= ne.java:320) > > > > > > at > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1428= ) > > > > > > > > > > > > > > > > > > > > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > --089e01537a96a6da4d051150d6bf--