hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Simple OOM crash?
Date Sat, 18 Dec 2010 03:23:19 GMT
On Fri, Dec 17, 2010 at 2:32 PM, Sandy Pratt <prattrs@adobe.com> wrote:
> Todd,
>
> While we're on the subject, and since you seem to know LZO well, can you answer a few
questions that have been playing around in my mind lately?
>
> 1) Does GZ also use the Direct Memory Buffer like LZO does?

I don't know much about the gzip codec, but I believe so long as
you're using the native one (ie have the hadoop native libraries
installed) it is very similar, yes.

>
> 2) What size to you run with for that buffer?  I kicked it up to 512m the other day
and I haven't seen problems but I wonder if that's overkill.

Which buffer are you referring to? I don't do any particular tuning
for the LZO codec. I do usually set io.file.buffer.size to 128KB in
Hadoop, but that's at a different layer.

>
> 3) How do you think LZO memory use compares to GZ?  The reason I ask is because ISTR
reading that GZ is very light on memory.  If it's significantly lighter than LZO, it might
be worth my while to use GZ instead, even though it's slower than LZO, and use the freed memory
to allocate another map slot.
>

All the LZO buffers are pooled and pretty transient so long as there
isn't a leak (like the bug you hit). Without a leak it should be
responsible for <1M of memory usage, in my experience.

Thanks
-Todd

>
> -----Original Message-----
> From: Sandy Pratt [mailto:prattrs@adobe.com]
> Sent: Friday, December 17, 2010 14:04
> To: user@hbase.apache.org
> Subject: RE: Simple OOM crash?
>
> That worked.  Thanks!
>
> -----Original Message-----
> From: Todd Lipcon [mailto:todd@cloudera.com]
> Sent: Friday, December 17, 2010 13:54
> To: user@hbase.apache.org
> Subject: Re: Simple OOM crash?
>
> Hi Sandy,
>
> I've seen that error on github as well. Try using the git:// URL instead of the http://
URL. The http transport in git is a bit buggy.
>
> Worst case there's also an option to download a tarball there.
>
> -Todd
>
> On Fri, Dec 17, 2010 at 10:59 AM, Sandy Pratt <prattrs@adobe.com> wrote:
>> Thanks all for your help.
>>
>> I set about to update the hadoop-lzo jar using Todd Lipcon's git repo (https://github.com/toddlipcon/hadoop-lzo),
and I encountered an error.  I'm not a git user, so I could be doing something wrong, but
I'm not sure what.  Has something changed with this repo in the last month or two?
>>
>> The error is pasted below:
>>
>>  [hadoop@ets-lax-prod-hadoop-01 hadoop-lzo]$ git pull walk
>> 7cbf6e85ad992faac880ef54a78ce926b6c02bda
>> walk fdbddcafd8276497d0181d40d72756336d204374
>> Getting alternates list for
>> http://github.com/toddlipcon/hadoop-lzo.git
>> Also look at http://github.com/network/312869.git/
>> error: The requested URL returned error: 502 (curl_result = 22,
>> http_code = 502, sha1 = 552b3f9cc1c7fd08bedfe029cf76a08e42302ae4)
>> Getting pack list for http://github.com/toddlipcon/hadoop-lzo.git
>> Getting pack list for http://github.com/network/312869.git/
>> error: The requested URL returned error: 502
>> error: Unable to find 552b3f9cc1c7fd08bedfe029cf76a08e42302ae4 under
>> http://github.com/toddlipcon/hadoop-lzo.git
>> Cannot obtain needed commit 552b3f9cc1c7fd08bedfe029cf76a08e42302ae4
>> while processing commit fdbddcafd8276497d0181d40d72756336d204374.
>> fatal: Fetch failed.
>>
>>
>> Thanks,
>>
>> Sandy
>>
>>
>> -----Original Message-----
>> From: Andrew Purtell [mailto:apurtell@yahoo.com]
>> Sent: Thursday, December 16, 2010 17:22
>> To: user@hbase.apache.org
>> Cc: Cosmin Lehene
>> Subject: RE: Simple OOM crash?
>>
>> Use hadoop-lzo-0.4.7 or higher from
>> https://github.com/toddlipcon/hadoop-lzo
>>
>>
>> Best regards,
>>
>>    - Andy
>>
>>
>> --- On Thu, 12/16/10, Sandy Pratt <prattrs@adobe.com> wrote:
>>
>>> From: Sandy Pratt <prattrs@adobe.com>
>>> Subject: RE: Simple OOM crash?
>>> To: "user@hbase.apache.org" <user@hbase.apache.org>
>>> Cc: "Cosmin Lehene" <clehene@adobe.com>
>>> Date: Thursday, December 16, 2010, 4:00 PM
>>>
>>> The LZO jar installed is:
>>>
>>> hadoop-lzo-0.4.6.jar
>>>
>>> The native LZO libs are from EPEL (I think) installed on Centos 5.5
>>> 64
>>> bit:
>>>
>>> [hadoop@ets-lax-prod-hadoop-02 Linux-amd64-64]$ yum info lzo-devel
>>> Name       : lzo-devel Arch       : x86_64 Version    : 2.02 Release
>>> : 2.el5.1 Size       : 144 k Repo       : installed Summary    :
>>> Development files for the lzo library URL        :
>>> http://www.oberhumer.com/opensource/lzo/
>>> License    : GPL
>>> Description: LZO is a portable lossless data compression library
>>> written in ANSI C.
>>>            : It offers
>>> pretty fast compression and very fast decompression.
>>>            : This
>>> package contains development files needed for lzo.
>>>
>>> Is the direct buffer used only with LZO, or is it always involved
>>> with HBase read/writes?
>>>
>>> Thanks for the help,
>>> Sandy
>>>
>>>
>>> -----Original Message-----
>>> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
>>>
>>> Sent: Thursday, December 16, 2010 15:50
>>> To: user@hbase.apache.org
>>> Cc: Cosmin Lehene
>>> Subject: Re: Simple OOM crash?
>>>
>>> What LZO version are you using?  You aren't running out of regular
>>> heap, you are running out of "Direct buffer memory" which is capped
>>> to prevent mishaps.  There is a flag to increase that size:
>>>
>>> -XX:MaxDirectMemorySize=100m
>>>
>>> etc
>>>
>>> enjoy,
>>> -ryan
>>>
>>> On Thu, Dec 16, 2010 at 3:07 PM, Sandy Pratt <prattrs@adobe.com>
>>> wrote:
>>> > Hello HBasers,
>>> >
>>> > I had a regionserver crash recently, and in perusing
>>> the logs it looks like it simply had a bit too little memory.  I'm
>>> running with 2200 MB heap on reach regionserver.  I plan to shave a
>>> bit off the child VM allowance in favor of the regionserver to
>>> correct this, probably bringing it up to 2500 MB.  My question is if
>>> there is any more specific memory allocation I should make rather
>>> than simply giving more to the RS.  I wonder about this because of the following:
>>> >
>>> > load=(requests=0, regions=709, usedHeap=1349,
>>> maxHeap=2198)
>>> >
>>> > which suggests to me that there was heap available,
>>> but the RS couldn't use it for some reason.
>>> >
>>> > Conjecture: I do run with LZO compression, so I wonder
>>> if I could be hitting that memory leak referenced earlier on the list.
>>> I know there's a new version of the LZO library available that I
>>> should upgrade to, but is it also possible to simply alter the table
>>> to gzip compression and do a major compaction, then uninstall LZO
>>> once that completes?
>>> >
>>> > Log follows:
>>> >
>>> > 2010-12-15 20:01:05,239 INFO
>>> > org.apache.hadoop.hbase.regionserver.HRegion: Starting
>>> compaction on
>>> > region
>>> ets.events,36345112f5654a29b308014f89c108e6,12798158203
>>> > 11.1063152548
>>> > 2010-12-15 20:01:05,239 DEBUG
>>> > org.apache.hadoop.hbase.regionserver.Store: Major
>>> compaction triggered
>>> > on store f1; time since last major compaction
>>> 119928149ms
>>> > 2010-12-15 20:01:05,240 INFO
>>> > org.apache.hadoop.hbase.regionserver.Store: Started
>>> compaction of 2
>>> > file(s) in f1 of
>>> ets.events,36345112f5654a29b308014f89c108e6,12
>>> > 79815820311.1063152548  into
>>> >
>>> hdfs://ets-lax-prod-hadoop-01.corp.adobe.com:54310/hbase/ets.events/1
>>> 0
>>> > 63152548/.tmp, sequenceid=25718885315
>>> > 2010-12-15 20:01:19,403 WARN
>>> > org.apache.hadoop.hbase.regionserver.Store: Not in
>>> >
>>> setorg.apache.hadoop.hbase.regionserver.StoreScanner@7466c84
>>> > 2010-12-15 20:01:19,572 FATAL
>>> > org.apache.hadoop.hbase.regionserver.HRegionServer:
>>> Aborting region
>>> > server
>>> serverName=ets-lax-prod-hadoop-02.corp.adobe.com,60020,
>>> > 1289682554219, load=(requests=0, regions=709,
>>> usedHeap=1349,
>>> > maxHeap=2198): Uncaught exception in service thread
>>> > regionserver60020.compactor
>>> > java.lang.OutOfMemoryError: Direct buffer memory
>>> >        at
>>> java.nio.Bits.reserveMemory(Bits.java:656)
>>> >        at
>>> java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:113)
>>> >        at
>>> java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:305)
>>> >        at
>>> >
>>> com.hadoop.compression.lzo.LzoCompressor.init(LzoCompressor.java:223)
>>> >        at
>>> >
>>> com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:20
>>> 7
>>> > )
>>> >        at
>>> >
>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
>>> 1
>>> > 05)
>>> >        at
>>> >
>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
>>> 1
>>> > 12)
>>> >        at
>>> >
>>> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(
>>> C
>>> > ompression.java:198)
>>> >        at
>>> >
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HF
>>> i
>>> > le.java:391)
>>> >        at
>>> >
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:377
>>> )
>>> >        at
>>> >
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFil
>>> e
>>> > .java:348)
>>> >        at
>>> >
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:530)
>>> >        at
>>> >
>>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:495)
>>> >        at
>>> >
>>> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFil
>>> e
>>> > .java:817)
>>> >        at
>>> >
>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:811)
>>> >        at
>>> >
>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:670)
>>> >        at
>>> >
>>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.ja
>>> v
>>> > a:722)
>>> >        at
>>> >
>>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.ja
>>> v
>>> > a:671)
>>> >        at
>>> >
>>> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSp
>>> l
>>> > itThread.java:84)
>>> > 2010-12-15 20:01:19,586 INFO
>>> > org.apache.hadoop.hbase.regionserver.HRegionServer:
>>> Dump of metrics:
>>> > request=0.0, regions=709, stores=709, storefiles=731,
>>>
>>> > storefileIndexSize=418, memstoreSize=33,
>>> compactionQueueSize=15,
>>> > usedHeap=856, maxHeap=2198, blockCacheSize=366779472,
>>>
>>> > blockCacheFree=87883088, blockCacheCount=5494,
>>> blockCacheHitRatio=0
>>> > 2010-12-15 20:01:20,571 INFO
>>> org.apache.hadoop.ipc.HBaseServer:
>>> > Stopping server on 60020
>>> >
>>> > Thanks,
>>> >
>>> > Sandy
>>> >
>>> >
>>>
>>
>>
>>
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
View raw message