hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Salabhanjika S <salabhanji...@gmail.com>
Subject Re: Region server slowdown
Date Tue, 18 Mar 2014 06:17:55 GMT
My bad.
> - I strongly feel this issue has something to do with HBase version. I
> verified the code paths of the stack I posted.

Read this as "I DON'T feel this issue has something to do with HBase version."

On Tue, Mar 18, 2014 at 10:12 AM, Salabhanjika S
<salabhanjika9@gmail.com> wrote:
> Thanks Rodinov & Enis for responding. I agree with you that we need to upgrade.
>
> As I mentioned in my first mail, we are in process of upgrade.
>>> >>> We are using hbase version 0.90.6 (please don't complain of old
>>> >>> version. we are in process of upgrading)
>
> - Suboptimal (as per me) code snippets I posted in followup mail holds
> good for trunk as well.
>
> - I strongly feel this issue has something to do with HBase version. I
> verified the code paths of the stack I posted.
> I don't see any significant changes in current version in this code
> (Flusher - getCompressor).
>
>
> On Tue, Mar 18, 2014 at 2:30 AM, Enis Söztutar <enis.soz@gmail.com> wrote:
>> Hi
>>
>> Agreed with Vladimir. I doubt anybody will spend the time to debug the
>> issue. It would be easier if you can upgrade your HBase cluster. Also you
>> will have to upgrade your Hadoop cluster as well. You should go with
>> 0.96.x/0.98.x and either Hadoop-2.2 or Hadoop2.3. Check out the Hbase book
>> for the upgrade process.
>>
>> Enis
>>
>>
>> On Mon, Mar 17, 2014 at 11:19 AM, Vladimir Rodionov <vrodionov@carrieriq.com
>>> wrote:
>>
>>> I think, 0.90.6 has reached EOL a couple years ago. The best you can do
>>> right now is
>>> start planning upgrading to the latest stable 0.94 or 0.96.
>>>
>>> Best regards,
>>> Vladimir Rodionov
>>> Principal Platform Engineer
>>> Carrier IQ, www.carrieriq.com
>>> e-mail: vrodionov@carrieriq.com
>>>
>>> ________________________________________
>>> From: Salabhanjika S [salabhanjika9@gmail.com]
>>> Sent: Monday, March 17, 2014 2:55 AM
>>> To: dev@hbase.apache.org
>>> Subject: Re: Region server slowdown
>>>
>>> @Devs, please respond if you can provide me some hints on this problem.
>>>
>>> Did some more analysis. While going through the code in stack track I
>>> noticed something sub-optimal.
>>> This may not be a root cause of our slowdown but I felt it may be some
>>> thing worthy to optimize/fix.
>>>
>>> HBase is making a call to Compressor *WITHOUT* config object. This is
>>> resulting in configuration reload for every call.
>>> Should this be calling with existing config object as a parameter so
>>> that configuration reload (discovery & xml parsing) will not happen so
>>> frequently?
>>>
>>>
>>> http://svn.apache.org/viewvc/hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/io/compress/Compression.java?view=markup
>>> {code}
>>> 309 public Compressor getCompressor() {
>>> 310 CompressionCodec codec = getCodec(conf);
>>> 311 if (codec != null) {
>>> 312 Compressor compressor = CodecPool.getCompressor(codec);
>>> 313 if (compressor != null) {
>>> {code}
>>>
>>>
>>> http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/CodecPool.java?view=markup
>>> {code}
>>> 162 public static Compressor getCompressor(CompressionCodec codec) {
>>> 163 return getCompressor(codec, null);
>>> 164 }
>>> {code}
>>>
>>> On Fri, Mar 14, 2014 at 1:47 PM, Salabhanjika S <salabhanjika9@gmail.com>
>>> wrote:
>>> > Thanks for quick response Ted.
>>> >
>>> > - Hadoop version is 0.20.2
>>> > - Other previous flushes (600MB to 1.5GB) takes around 60 to 300 seconds
>>> >
>>> > On Fri, Mar 14, 2014 at 1:21 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>>> >> What Hadoop version are you using ?
>>> >>
>>> >> Btw, the sentence about previous flushes was incomplete.
>>> >>
>>> >> Cheers
>>> >>
>>> >> On Mar 14, 2014, at 12:12 AM, Salabhanjika S <salabhanjika9@gmail.com>
>>> wrote:
>>> >>
>>> >>> Devs,
>>> >>>
>>> >>> We are using hbase version 0.90.6 (please don't complain of old
>>> >>> version. we are in process of upgrading) in our production and we
are
>>> >>> noticing a strange problem arbitrarily for every few weeks. Region
>>> >>> server goes extremely slow.
>>> >>> We have to restart Region Server once this happens. There is no
unique
>>> >>> pattern of this problem. This happens on different region servers,
>>> >>> different tables/regions and different times.
>>> >>>
>>> >>> Here are observations & findings from our analysis.
>>> >>> - We are using LZO compression (0.4.10).
>>> >>>
>>> >>> - [RS Dashboard] Flush is running for more than 6 hours. It is in
>>> >>> "creating writer" status for long time. Other previous flushes (600MB
>>> >>> to 1.5GB) takes
>>> >>>
>>> >>> - [Thread dumps] No deadlocks. Flusher thread stack. Even compactor
>>> >>> thread is in same state Configuration.loadResource
>>> >>> "regionserver60020.cacheFlusher" daemon prio=10 tid=0x00007efd016c4800
>>> >>> nid=0x35e9 runnable [0x00007efcad9c5000]
>>> >>>   java.lang.Thread.State: RUNNABLE
>>> >>>    at
>>> sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
>>> >>>    at
>>> sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
>>> >>>    - locked <0x00007f02ccc2ef78> (a
>>> >>> sun.net.www.protocol.file.FileURLConnection)
>>> >>>    at
>>> com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:653)
>>> >>>    ... [cutting down some stack to keep mail compact. all this stack
>>> >>> is in com.sun.org.apache.xerces...]
>>> >>>    at
>>> com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
>>> >>>    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180)
>>> >>>    at
>>> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1308)
>>> >>>    at
>>> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1259)
>>> >>>    at
>>> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1200)
>>> >>>    - locked <0x00007f014f1543b8> (a
>>> org.apache.hadoop.conf.Configuration)
>>> >>>    at org.apache.hadoop.conf.Configuration.get(Configuration.java:501)
>>> >>>    at
>>> com.hadoop.compression.lzo.LzoCodec.getCompressionStrategy(LzoCodec.java:205)
>>> >>>    at
>>> com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:204)
>>> >>>    at
>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:105)
>>> >>>    at
>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112)
>>> >>>    at
>>> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:236)
>>> >>>    at
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile.java:397)
>>> >>>    at
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383)
>>> >>>    at
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.java:354)
>>> >>>    at
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
>>> >>>    at
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
>>> >>>    at
>>> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:836)
>>> >>>    at
>>> org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:530)
>>> >>>    - locked <0x00007efe1b6e7af8> (a java.lang.Object)
>>> >>>    at
>>> org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:496)
>>> >>>    at
>>> org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:83)
>>> >>>    at
>>> org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:1576)
>>> >>>    at
>>> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1046)
>>> >>>    at
>>> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:967)
>>> >>>    at
>>> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:915)
>>> >>>    at
>>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:394)
>>> >>>    at
>>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:368)
>>> >>>    at
>>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:242)
>>> >>>
>>> >>> Any leads on this please?
>>> >>>
>>> >>> -S
>>>
>>> Confidentiality Notice:  The information contained in this message,
>>> including any attachments hereto, may be confidential and is intended to be
>>> read only by the individual or entity to whom this message is addressed. If
>>> the reader of this message is not the intended recipient or an agent or
>>> designee of the intended recipient, please note that any review, use,
>>> disclosure or distribution of this message or its attachments, in any form,
>>> is strictly prohibited.  If you have received this message in error, please
>>> immediately notify the sender and/or Notifications@carrieriq.com and
>>> delete or destroy any copy of this message and its attachments.
>>>

Mime
View raw message