hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Friso van Vollenhoven <fvanvollenho...@xebia.com>
Subject Re: problem with LZO compressor on write only loads
Date Mon, 10 Jan 2011 09:38:48 GMT
Hey Todd,

Just FYI, I have only tried the 0.4.8 LZO version with the G1 collector, not CMS. When I saw
the problem with earlier versions I did a run with both G1 and CMS and it looked the same.

I am not sure if it makes a difference, though. My guess is that the problem occurs because
the byte buffers created by the compressor objects are being reused a couple of times making
them longer lived and promote out of young gen, which then keeps them from being finalized
for a long time, which in turn never releases the native allocations. But this is just my
hunch. I have not looked into verifying this...


Friso



On 9 jan 2011, at 03:48, Todd Lipcon wrote:

> Hey everyone,
>
> Just wanted to let you know that I will be looking into this this coming
> week - we've marked it as an important thing to investigate prior t our next
> beta release.
>
> Thanks
> -Todd
>
> On Sat, Jan 8, 2011 at 4:59 AM, Tatsuya Kawano <tatsuya6502@gmail.com>wrote:
>
>>
>> Hi Friso,
>>
>> So you found HBase 0.89 on CDH3b2 doesn't have the problem. I wonder what
>> would happen if you replace hadoop-core-*.jar in CDH3b3 with the one
>> contained in HBase 0.90RC distribution
>> (hadoop-core-0.20-append-r1056497.jar) and then rebuild hadoop-lzo against
>> it.
>>
>> Here is the comment on the LzoCompressor#reinit() method:
>>
>> -----------------------------------
>> // ... this method isn't in vanilla 0.20.2, but is in CDH3b3 and YDH
>> public void reinit(Configuration conf) {
>> -----------------------------------
>>
>>
>> https://github.com/kevinweil/hadoop-lzo/blob/6cbf4e232d7972c94107600567333a372ea08c0a/src/java/com/hadoop/compression/lzo/LzoCompressor.java#L196
>>
>>
>> I don't know if hadoop-core-0.20-append-r1056497.jar is a vanilla 0.20.2 or
>> more like CDH3b3. Maybe I'm wrong, but if it doesn't call reinit(), you'll
>> have a good chance to get a stable HBase 0.90.
>>
>> Good luck!
>>
>> Tatsuya
>>
>> --
>> Tatsuya Kawano (Mr.)
>> Tokyo, Japan
>>
>> http://twitter.com/#!/tatsuya6502
>>
>>
>>
>>
>> On 01/08/2011, at 6:33 PM, Friso van Vollenhoven wrote:
>>
>>> Hey Ryan,
>>> I went back to the older version. Problem is that going to HBase 0.90
>> requires a API change on the compressor side, which forces you to a version
>> newer than 0.4.6 or so. So I also had to go back to HBase 0.89, which is
>> again not compatible with CDH3b3, so I am back on CDH3b2 again. HBase 0.89
>> is stable for us, so this is not at all a problem. But this LZO problem is
>> really in the way of our projected upgrade path (my client would like to end
>> up with CDH3 everything in the end, because of the support options available
>> in case things go wrong and the Cloudera administration courses available
>> when new ops people are hired).
>>>
>>> Cheers,
>>> Friso
>>>
>>>
>>>
>>> On 7 jan 2011, at 22:28, Ryan Rawson wrote:
>>>
>>>> Hey,
>>>>
>>>> Here at SU we continue to use version 0.1.0 of hadoop-gpl-compression.
>>>> I know some of the newer versions had bugs which leaked
>>>> DirectByteBuffer space, which might be what you are running in to.
>>>>
>>>> Give the older version a shot, there really hasnt been much in the way
>>>> of how LZO works in a while, most of the 'extra' stuff added was to
>>>> support features hbase does not use.
>>>>
>>>> Good luck!
>>>>
>>>> -ryan
>>>>
>>>> ps: http://code.google.com/p/hadoop-gpl-compression/downloads/list
>>>>
>>>>
>>>> On Wed, Jan 5, 2011 at 10:26 PM, Friso van Vollenhoven
>>>> <fvanvollenhoven@xebia.com> wrote:
>>>>> Thanks Sandy.
>>>>>
>>>>> Does setting -XX:MaxDirectMemorySize help in triggering GC when you're
>> reaching that limit? Or does it just OOME before the actual RAM is exhausted
>> (then you prevent swapping, which is nicer, though)?
>>>>>
>>>>> I guess LZO is not a solution that fits all, but we do a lot of random
>> reads and latency can be an issue for us, so I suppose we have to stick with
>> it.
>>>>>
>>>>>
>>>>> Friso
>>>>>
>>>>>
>>>>>
>>>>> On 5 jan 2011, at 20:36, Sandy Pratt wrote:
>>>>>
>>>>>> I was in a similar situation recently, with similar symptoms, and
I
>> experienced a crash very similar to yours.  I don't have the specifics handy
>> at the moment, but I did post to this list about it a few weeks ago.  My
>> workload is fairly write-heavy.  I write about 10-20 million smallish
>> protobuf/xml blobs per day to an HBase cluster of 12 very underpowered
>> machines.
>>>>>>
>>>>>> The suggestions I received were two: 1) update to the latest
>> hadoop-lzo and 2) specify a max direct memory size to the JVM (e.g.
>> -XX:MaxDirectMemorySize=256m).
>>>>>>
>>>>>> I took a third route - change my tables back to gz compression for
the
>> time being while I figure out what to do.  Since then, my memory usage has
>> been rock steady, but more importantly my tables are roughly half the size
>> on disk that they were with LZO, and there has been no noticeable drop in
>> performance (but remember this is a write heavy workload, I'm not trying to
>> serve an online workload with low latency or anything like that).  At this
>> point, I might not return to LZO.
>>>>>>
>>>>>> In general, I'm not convinced that "use LZO" is universally good
>> advice for all HBase users.  For one thing, I think it assumes that all
>> installations are focused towards low latency, which is not always the case
>> (sometimes merely good latency is enough and great latency is not needed).
>> Secondly, it assumes some things about where the performance bottleneck
>> lives.   For example, LZO performs well in micro-benchmarks, but if you find
>> yourself in an IO-bound batch processing situation, you might be better
>> served by a higher compression ratio, even if it's more computationally
>> expensive.
>>>>>>
>>>>>> Sandy
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Friso van Vollenhoven [mailto:fvanvollenhoven@xebia.com]
>>>>>>> Sent: Tuesday, January 04, 2011 08:00
>>>>>>> To: <user@hbase.apache.org>
>>>>>>> Subject: Re: problem with LZO compressor on write only loads
>>>>>>>
>>>>>>> I ran the job again, but with less other processes running on
the
>> same
>>>>>>> machine, so with more physical memory available to HBase. This
was to
>> see
>>>>>>> whether there was a point where it would stop allocating more
>> buffers.
>>>>>>> When I do this, after many hours, one of the RSes crashed with
a
>> OOME. See
>>>>>>> here:
>>>>>>>
>>>>>>> 2011-01-04 11:32:01,332 FATAL
>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING
region
>>>>>>> server serverName=w5r1.inrdb.ripe.net,60020,1294091507228,
>>>>>>> load=(requests=6246, regions=258, usedHeap=1790, maxHeap=16000):
>>>>>>> Uncaught exception in service thread regionserver60020.compactor
>>>>>>> java.lang.OutOfMemoryError: Direct buffer memory
>>>>>>>     at java.nio.Bits.reserveMemory(Bits.java:633)
>>>>>>>     at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:98)
>>>>>>>     at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
>>>>>>>     at
>>>>>>> com.hadoop.compression.lzo.LzoCompressor.init(LzoCompressor.java:248)
>>>>>>>     at
>>>>>>>
>> com.hadoop.compression.lzo.LzoCompressor.reinit(LzoCompressor.java:207
>>>>>>> )
>>>>>>>     at
>>>>>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
>>>>>>> 105)
>>>>>>>     at
>>>>>>> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:
>>>>>>> 112)
>>>>>>>     at
>>>>>>>
>> org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(C
>>>>>>> ompression.java:200)
>>>>>>>     at
>>>>>>>
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.getCompressingStream(HFile
>>>>>>> .java:397)
>>>>>>>     at
>>>>>>>
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.newBlock(HFile.java:383)
>>>>>>>     at
>>>>>>>
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.checkBlockBoundary(HFile.ja
>>>>>>> va:354)
>>>>>>>     at
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:536)
>>>>>>>     at
>> org.apache.hadoop.hbase.io.hfile.HFile$Writer.append(HFile.java:501)
>>>>>>>     at
>>>>>>>
>> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.j
>>>>>>> ava:836)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:931)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:732)
>>>>>>>     at
>>>>>>>
>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav
>>>>>>> a:764)
>>>>>>>     at
>>>>>>>
>> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.jav
>>>>>>> a:709)
>>>>>>>     at
>>>>>>> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSp
>>>>>>> litThread.java:81)
>>>>>>> 2011-01-04 11:32:01,369 INFO
>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
>>>>>>> request=0.0, regions=258, stores=516, storefiles=186,
>>>>>>> storefileIndexSize=179, memstoreSize=2125, compactionQueueSize=2,
>>>>>>> usedHeap=1797, maxHeap=16000, blockCacheSize=55051488,
>>>>>>> blockCacheFree=6655834912, blockCacheCount=0, blockCacheHitCount=0,
>>>>>>> blockCacheMissCount=2397107, blockCacheEvictedCount=0,
>>>>>>> blockCacheHitRatio=0, blockCacheHitCachingRatio=0
>>>>>>>
>>>>>>> I am guessing the OS won't allocate any more memory to the process.
>> As you
>>>>>>> can see, the used heap is nowhere near the max heap.
>>>>>>>
>>>>>>> Also, this happens from the compaction, it seems. I had not
>> considered those
>>>>>>> as a suspect yet. I could try running with a larger compaction
>> threshold and
>>>>>>> blocking store files. Since this is a write only load, it should
not
>> be a problem.
>>>>>>> In our normal operation, compactions and splits are quite common,
>> though,
>>>>>>> because we do read-modify-write cycles a lot. Anyone else doing
>> update
>>>>>>> heavy work with LZO?
>>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Friso
>>>>>>>
>>>>>>>
>>>>>>> On 4 jan 2011, at 01:54, Todd Lipcon wrote:
>>>>>>>
>>>>>>>> Fishy. Are your cells particularly large? Or have you tuned
the
>> HFile
>>>>>>>> block size at all?
>>>>>>>>
>>>>>>>> -Todd
>>>>>>>>
>>>>>>>> On Mon, Jan 3, 2011 at 2:15 PM, Friso van Vollenhoven <
>>>>>>>> fvanvollenhoven@xebia.com> wrote:
>>>>>>>>
>>>>>>>>> I tried it, but it doesn't seem to help. The RS processes
grow to
>>>>>>>>> 30Gb in minutes after the job started.
>>>>>>>>>
>>>>>>>>> Any ideas?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Friso
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 3 jan 2011, at 19:18, Todd Lipcon wrote:
>>>>>>>>>
>>>>>>>>>> Hi Friso,
>>>>>>>>>>
>>>>>>>>>> Which OS are you running? Particularly, which version
of glibc?
>>>>>>>>>>
>>>>>>>>>> Can you try running with the environment variable
>>>>>>> MALLOC_ARENA_MAX=1 set?
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> -Todd
>>>>>>>>>>
>>>>>>>>>> On Mon, Jan 3, 2011 at 8:15 AM, Friso van Vollenhoven
<
>>>>>>>>>> fvanvollenhoven@xebia.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> I seem to run into a problem that occurs when
using LZO
>> compression
>>>>>>>>>>> on a heavy write only load. I am using 0.90 RC1
and, thus, the
>> LZO
>>>>>>>>>>> compressor code that supports the reinit() method
(from Kevin
>>>>>>>>>>> Weil's github,
>>>>>>>>> version
>>>>>>>>>>> 0.4.8). There are some more Hadoop LZO incarnations,
so I am
>>>>>>>>>>> pointing my question to this list.
>>>>>>>>>>>
>>>>>>>>>>> It looks like the compressor uses direct byte
buffers to store
>> the
>>>>>>>>> original
>>>>>>>>>>> and compressed bytes in memory, so the native
code can work with
>> it
>>>>>>>>> without
>>>>>>>>>>> the JVM having to copy anything around. The direct
buffers are
>>>>>>>>>>> possibly reused after a reinit() call, but will
often be newly
>>>>>>>>>>> created in the
>>>>>>>>> init()
>>>>>>>>>>> method, because the existing buffer can be the
wrong size for
>> reusing.
>>>>>>>>> The
>>>>>>>>>>> latter case will leave the previously used buffers
by the
>>>>>>>>>>> compressor instance eligible for garbage collection.
I think the
>>>>>>>>>>> problem is that
>>>>>>>>> this
>>>>>>>>>>> collection never occurs (in time), because the
GC does not
>> consider
>>>>>>>>>>> it necessary yet. The GC does not know about
the native heap and
>>>>>>>>>>> based on
>>>>>>>>> the
>>>>>>>>>>> state of the JVM heap, there is no reason to
finalize these
>> objects yet.
>>>>>>>>>>> However, direct byte buffers are only freed in
the finalizer, so
>>>>>>>>>>> the
>>>>>>>>> native
>>>>>>>>>>> heap keeps growing. On write only loads, a full
GC will rarely
>>>>>>>>>>> happen, because the max heap will not grow far
beyond the mem
>>>>>>>>>>> stores (no block
>>>>>>>>> cache
>>>>>>>>>>> is used). So what happens is that the machine
starts using swap
>>>>>>>>>>> before
>>>>>>>>> the
>>>>>>>>>>> GC will ever clean up the direct byte buffers.
I am guessing that
>>>>>>>>> without
>>>>>>>>>>> the reinit() support, the buffers were collected
earlier because
>>>>>>>>>>> the referring objects would also be collected
every now and then
>> or
>>>>>>>>>>> things
>>>>>>>>> would
>>>>>>>>>>> perhaps just never promote to an older generation.
>>>>>>>>>>>
>>>>>>>>>>> When I do a pmap on a running RS after it has
grown to some 40Gb
>>>>>>>>> resident
>>>>>>>>>>> size (with a 16Gb heap), it will show a lot of
near 64M anon
>> blocks
>>>>>>>>>>> (presumably native heap). I show this before
with the 0.4.6
>> version
>>>>>>>>>>> of Hadoop LZO, but that was under normal load.
After that I went
>>>>>>>>>>> back to a HBase version that does not require
the reinit(). Now I
>>>>>>>>>>> am on 0.90 with
>>>>>>>>> the
>>>>>>>>>>> new LZO, but never did a heavy load like this
one with that,
>> until
>>>>>>>>> now...
>>>>>>>>>>>
>>>>>>>>>>> Can anyone with a better understanding of the
LZO code confirm
>> that
>>>>>>>>>>> the above could be the case? If so, would it
be possible to
>> change
>>>>>>>>>>> the LZO compressor (and decompressor) to use
maybe just one fixed
>>>>>>>>>>> size buffer
>>>>>>>>> (they
>>>>>>>>>>> all appear near 64M anyway) or possibly reuse
an existing buffer
>>>>>>>>>>> also
>>>>>>>>> when
>>>>>>>>>>> it is not the exact required size but just large
enough to make
>> do?
>>>>>>>>> Having
>>>>>>>>>>> short lived direct byte buffers is apparently
a discouraged
>>>>>>>>>>> practice. If anyone can provide some pointers
on what to look out
>>>>>>>>>>> for, I could invest some time in creating a patch.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Friso
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Todd Lipcon
>>>>>>>>>> Software Engineer, Cloudera
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Todd Lipcon
>>>>>>>> Software Engineer, Cloudera
>>>>>>
>>>>>
>>>>>
>>>
>>
>>
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera


Mime
View raw message