hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jtay...@salesforce.com>
Subject Re: Custom preCompact RegionObserver crashes entire cluster on OOME: Heap Space
Date Tue, 12 Feb 2013 15:32:46 GMT
IMO, I don't think it's safe to change the KV in-place. We always create a new KV in our coprocessors.

James

On Feb 12, 2013, at 6:41 AM, "Mesika, Asaf" <asaf.mesika@gmail.com> wrote:

> I'm seeing a very strange behavior:
> 
> If I run a scan during major compaction, I can see both the modified Delta Key Value
(which contains the aggregated values - e.g. 9) and the other two delta columns that were
used for this aggregated column (e.g, 3, 3) - as if Scan is exposed to the key values produced
in mid scan.
> Could it be related to Cache somehow?
> 
> I am modifying the KeyValue object received from the InternalScanner in preCompact (modifying
its value).
> 
> On Feb 12, 2013, at 11:22 AM, Anoop Sam John wrote:
> 
>>> The question is: is it "legal" to change a KV I received from the InternalScanner
before adding it the Result - i..e returning it from my own InternalScanner?
>> 
>> You can change as per your need IMO
>> 
>> -Anoop-
>> 
>> ________________________________________
>> From: Mesika, Asaf [asaf.mesika@gmail.com]
>> Sent: Tuesday, February 12, 2013 2:43 PM
>> To: user@hbase.apache.org
>> Subject: Re: Custom preCompact RegionObserver crashes entire cluster on OOME: Heap
Space
>> 
>> I am trying to reduce the amount of KeyValue generated during the preCompact, but
I'm getting some weird behaviors.
>> 
>> Let me describe what I am doing in short:
>> 
>> We have a counters table, with the following structure:
>> 
>> RowKey =  A combination of field values representing group by key.
>> CF = time span aggregate (Hour, Day, Month). Currently we have only for Hour.
>> CQ = Round-to-Hour timestamp (long).
>> Value = The count
>> 
>> We collect raw data, and updates the counters table for the matched group by key,
hour.
>> We tried using Increment, but discovered its very very slow.
>> Instead we've decided to update the counters upon compaction. We write the deltas
into the same row-key, but a longer column qualifier: <RoundedToTheHourTS><Type><UniqueId>.
>> <Type> is: Delta or Aggregate.
>> Delta stands for a delta column qualifier we send from our client.
>> 
>> in the preCompact, I create an InternalScanner which aggregates the delta column
qualifier values and generates a new key value with Type Aggregate: <TS><A><UniqueID>
>> 
>> The problem with this implementation that it consumes more memory.
>> 
>> Now, I've tried avoiding the creation of the Aggregate type KV, by simply re-using
the 1st delta column qualifier: simply changing its value in the KeyValue.
>> But from some reason, after a couple of minor / major compactions, I see data loss,
when I count the values and compare them to the expected.
>> 
>> 
>> The question is: is it "legal" to change a KV I received from the InternalScanner
before adding it the Result - i..e returning it from my own InternalScanner?
>> 
>> 
>> 
>> 
>> 
>> 
>> On Feb 12, 2013, at 8:44 AM, Anoop Sam John wrote:
>> 
>>> Asaf,
>>>         You have created a wrapper around the original InternalScanner instance
created by the compaction flow?
>>> 
>>>> Where do the KV generated during the compaction process queue up before being
written to the disk? Is this buffer configurable?
>>> When I wrote the Region Observer my assumption was the the compaction process
works in Streaming fashion, thus even if I decide to generate a KV per KV I see, it still
shouldn't be a problem memory wise.
>>> 
>>> There is no queuing. Your assumption is correct only. It is written to the writer
as and when. (Just like how memstore flush doing the HFile write) As Lars said a look at your
code can tell if some thing is going wrong.  Do you have blooms being used?
>>> 
>>> -Anoop-
>>> ________________________________________
>>> From: Mesika, Asaf [asaf.mesika@gmail.com]
>>> Sent: Tuesday, February 12, 2013 11:16 AM
>>> To: user@hbase.apache.org
>>> Subject: Custom preCompact RegionObserver crashes entire cluster on OOME: Heap
Space
>>> 
>>> Hi,
>>> 
>>> I wrote a RegionObserver which does preCompact.
>>> I activated in pre-production, and then entire cluster dropped dead: One RegionServer
after another crashed on OutOfMemoryException: Heap Space.
>>> 
>>> My preCompact method generates a KeyValue per each set of Column Qualifiers it
sees.
>>> When I remove the coprocessor and restart the cluster, cluster remains stable.
>>> I have 8 RS, each has 4 GB Heap. There about 9 regions (from a specific table
I'm working on) per Region Server.
>>> Running HBase 0.94.3
>>> 
>>> The crash occur when the major compaction fires up, apparently cluster wide.
>>> 
>>> 
>>> My question is this: Where do the KV generated during the compaction process
queue up before being written to the disk? Is this buffer configurable?
>>> When I wrote the Region Observer my assumption was the the compaction process
works in Streaming fashion, thus even if I decide to generate a KV per KV I see, it still
shouldn't be a problem memory wise.
>>> 
>>> Of course I'm trying to improve my code so it will generate much less new KV
(by simply altering the existing KVs received from the InternalScanner).
>>> 
>>> Thank you,
>>> 
>>> Asaf
> 

Mime
View raw message