Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5F6A310A79 for ; Mon, 17 Mar 2014 21:01:17 +0000 (UTC) Received: (qmail 64691 invoked by uid 500); 17 Mar 2014 21:01:15 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 64576 invoked by uid 500); 17 Mar 2014 21:01:15 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 64568 invoked by uid 99); 17 Mar 2014 21:01:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Mar 2014 21:01:15 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of enis.soz@gmail.com designates 209.85.213.43 as permitted sender) Received: from [209.85.213.43] (HELO mail-yh0-f43.google.com) (209.85.213.43) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Mar 2014 21:01:08 +0000 Received: by mail-yh0-f43.google.com with SMTP id b6so5904807yha.2 for ; Mon, 17 Mar 2014 14:00:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=0X/TwSpGtQoVLzcvs9YjLaxb0mAN8R08T/CR61Q9qE0=; b=W41ZzURuZYyxT2qjbouqgn6wj7a+sM9K0JSTvCZT9AEukXz7HJ2Hy15R0ukE3elH4M 6NJ8PuYlKvPkpeTH6hJlm5rcwHRRjkl5gEg+PoXurGTn7zS41PDpP5O8HFqgpgBblHDS Y53DVMx1zXp4uN3GSxxxjYaQ4pdYLbNWspkV3uNUY655FiTDAOx3nmHMVWIjJyMlXHd4 F+xZ1He9mziy5ZJi4y4ApaO19FBxYDwTxZNa17sxolHLcSpWIFMgTK2x2cd3l/orJWhq f3wpmUzf+h3HgT2Id1ViehqZRCiapgZ3VQL8Gb4gk+fmy5uPCvlJgfKb+sMz62ju8ney Q7eA== X-Received: by 10.236.94.197 with SMTP id n45mr12965109yhf.46.1395090046684; Mon, 17 Mar 2014 14:00:46 -0700 (PDT) MIME-Version: 1.0 Received: by 10.170.35.81 with HTTP; Mon, 17 Mar 2014 14:00:26 -0700 (PDT) In-Reply-To: References: <3299A6E6-9D5B-46B5-AE4F-A8C07964FE3E@gmail.com> From: =?UTF-8?Q?Enis_S=C3=B6ztutar?= Date: Mon, 17 Mar 2014 14:00:26 -0700 Message-ID: Subject: Re: Region server slowdown To: "dev@hbase.apache.org" Content-Type: multipart/alternative; boundary=20cf301af897a9913604f4d3b6e3 X-Virus-Checked: Checked by ClamAV on apache.org --20cf301af897a9913604f4d3b6e3 Content-Type: text/plain; charset=UTF-8 Hi Agreed with Vladimir. I doubt anybody will spend the time to debug the issue. It would be easier if you can upgrade your HBase cluster. Also you will have to upgrade your Hadoop cluster as well. You should go with 0.96.x/0.98.x and either Hadoop-2.2 or Hadoop2.3. Check out the Hbase book for the upgrade process. Enis On Mon, Mar 17, 2014 at 11:19 AM, Vladimir Rodionov wrote: > I think, 0.90.6 has reached EOL a couple years ago. The best you can do > right now is > start planning upgrading to the latest stable 0.94 or 0.96. > > Best regards, > Vladimir Rodionov > Principal Platform Engineer > Carrier IQ, www.carrieriq.com > e-mail: vrodionov@carrieriq.com > > ________________________________________ > From: Salabhanjika S [salabhanjika9@gmail.com] > Sent: Monday, March 17, 2014 2:55 AM > To: dev@hbase.apache.org > Subject: Re: Region server slowdown > > @Devs, please respond if you can provide me some hints on this problem. > > Did some more analysis. While going through the code in stack track I > noticed something sub-optimal. > This may not be a root cause of our slowdown but I felt it may be some > thing worthy to optimize/fix. > > HBase is making a call to Compressor *WITHOUT* config object. This is > resulting in configuration reload for every call. > Should this be calling with existing config object as a parameter so > that configuration reload (discovery & xml parsing) will not happen so > frequently? > > > http://svn.apache.org/viewvc/hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/compress/Compression.java?view=markup > {code} > 309 public Compressor getCompressor() { > 310 CompressionCodec codec = getCodec(conf); > 311 if (codec != null) { > 312 Compressor compressor = CodecPool.getCompressor(codec); > 313 if (compressor != null) { > {code} > > > http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/CodecPool.java?view=markup > {code} > 162 public static Compressor getCompressor(CompressionCodec codec) { > 163 return getCompressor(codec, null); > 164 } > {code} > > On Fri, Mar 14, 2014 at 1:47 PM, Salabhanjika S > wrote: > > Thanks for quick response Ted. > > > > - Hadoop version is 0.20.2 > > - Other previous flushes (600MB to 1.5GB) takes around 60 to 300 seconds > > > > On Fri, Mar 14, 2014 at 1:21 PM, Ted Yu wrote: > >> What Hadoop version are you using ? > >> > >> Btw, the sentence about previous flushes was incomplete. > >> > >> Cheers > >> > >> On Mar 14, 2014, at 12:12 AM, Salabhanjika S > wrote: > >> > >>> Devs, > >>> > >>> We are using hbase version 0.90.6 (please don't complain of old > >>> version. we are in process of upgrading) in our production and we are > >>> noticing a strange problem arbitrarily for every few weeks. Region > >>> server goes extremely slow. > >>> We have to restart Region Server once this happens. There is no unique > >>> pattern of this problem. This happens on different region servers, > >>> different tables/regions and different times. > >>> > >>> Here are observations & findings from our analysis. > >>> - We are using LZO compression (0.4.10). > >>> > >>> - [RS Dashboard] Flush is running for more than 6 hours. It is in > >>> "creating writer" status for long time. Other previous flushes (600MB > >>> to 1.5GB) takes > >>> > >>> - [Thread dumps] No deadlocks. Flusher thread stack. Even compactor > >>> thread is in same state Configuration.loadResource > >>> "regionserver60020.cacheFlusher" daemon prio=10 tid=0x00007efd016c4800 > >>> nid=0x35e9 runnable [0x00007efcad9c5000] > >>> java.lang.Thread.State: RUNNABLE > >>> at > sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70) > >>> at > sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161) > >>> - locked <0x00007f02ccc2ef78> (a > >>> sun.net.www.protocol.file.FileURLConnection) > >>> at > com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:653) > >>> ... [cutting down some stack to keep mail compact. all this stack > >>> is in com.sun.org.apache.xerces...] > >>> at > com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284) > >>> at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180) > >>> at > org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1308) > >>> at > org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1259) > >>> at > org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1200) > >>> - locked <0x00007f014f1543b8> (a > org.apache.hadoop.conf.Configuration) > >>> at org.apache.hadoop.conf.Configuration.get(Configuration.java:501) > >>> at > com.hadoop.compression.lzo.LzoCodec.getCompressionStrategy(LzoCodec.java:205) > >>> at > com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:204) > >>> at > org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:105) > >>> at > org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112) > >>> at > org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236) > >>> at > org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile.java:397) > >>> at > org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383) > >>> at > org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.java:354) > >>> at > org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536) > >>> at > org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501) > >>> at > org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:836) > >>> at > org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:530) > >>> - locked <0x00007efe1b6e7af8> (a java.lang.Object) > >>> at > org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:496) > >>> at > org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:83) > >>> at > org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:1576) > >>> at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1046) > >>> at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:967) > >>> at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:915) > >>> at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:394) > >>> at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:368) > >>> at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:242) > >>> > >>> Any leads on this please? > >>> > >>> -S > > Confidentiality Notice: The information contained in this message, > including any attachments hereto, may be confidential and is intended to be > read only by the individual or entity to whom this message is addressed. If > the reader of this message is not the intended recipient or an agent or > designee of the intended recipient, please note that any review, use, > disclosure or distribution of this message or its attachments, in any form, > is strictly prohibited. If you have received this message in error, please > immediately notify the sender and/or Notifications@carrieriq.com and > delete or destroy any copy of this message and its attachments. > --20cf301af897a9913604f4d3b6e3--