Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: domain of anoopsj@huawei.com designates
 119.145.14.64 as permitted sender)
From: Anoop Sam John <anoopsj@huawei.com>
To: "user@hbase.apache.org" <user@hbase.apache.org>
Subject: RE: Custom preCompact RegionObserver crashes entire cluster on
 OOME: Heap Space
Thread-Topic: Custom preCompact RegionObserver crashes entire cluster on
 OOME: Heap Space
Thread-Index: AQHOCOR1s51AQ77Xj0eIzH9eN5tiaph1xcgr//+kroCAAIits///0v+AgAF8utg=
Date: Wed, 13 Feb 2013 05:29:19 +0000
Message-ID: 
 <0CE69E9126D0344088798A3B7F7F80863AECCCA8@szxeml553-mbs.china.huawei.com>
References: <51F68F1C-6C3A-4B29-A97C-C269387FC69E@gmail.com>
 <0CE69E9126D0344088798A3B7F7F80863AECC345@szxeml553-mbs.china.huawei.com>,<98A8F664-6AFB-44EB-970D-71ABC8D2E34E@gmail.com>
 <0CE69E9126D0344088798A3B7F7F80863AECC596@szxeml553-mbs.china.huawei.com>,<BD9D65BF-65A2-4418-9935-203358923543@gmail.com>
In-Reply-To: <BD9D65BF-65A2-4418-9935-203358923543@gmail.com>
Accept-Language: en-US, zh-CN
Content-Language: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

Can you post the code in your new InternalScanner ?  next() method implemen=
tation.
Would like to see how you are doing thie KV change

-Anoop-
________________________________________
From: Mesika, Asaf [asaf.mesika@gmail.com]
Sent: Tuesday, February 12, 2013 8:11 PM
To: user@hbase.apache.org
Subject: Re: Custom preCompact RegionObserver crashes entire cluster on OOM=
E: Heap Space

I'm seeing a very strange behavior:

If I run a scan during major compaction, I can see both the modified Delta =
Key Value (which contains the aggregated values - e.g. 9) and the other two=
 delta columns that were used for this aggregated column (e.g, 3, 3) - as i=
f Scan is exposed to the key values produced in mid scan.
Could it be related to Cache somehow?

I am modifying the KeyValue object received from the InternalScanner in pre=
Compact (modifying its value).

On Feb 12, 2013, at 11:22 AM, Anoop Sam John wrote:

>> The question is: is it "legal" to change a KV I received from the Intern=
alScanner before adding it the Result - i..e returning it from my own Inter=
nalScanner?
>
> You can change as per your need IMO
>
> -Anoop-
>
> ________________________________________
> From: Mesika, Asaf [asaf.mesika@gmail.com]
> Sent: Tuesday, February 12, 2013 2:43 PM
> To: user@hbase.apache.org
> Subject: Re: Custom preCompact RegionObserver crashes entire cluster on O=
OME: Heap Space
>
> I am trying to reduce the amount of KeyValue generated during the preComp=
act, but I'm getting some weird behaviors.
>
> Let me describe what I am doing in short:
>
> We have a counters table, with the following structure:
>
> RowKey =3D  A combination of field values representing group by key.
> CF =3D time span aggregate (Hour, Day, Month). Currently we have only for=
 Hour.
> CQ =3D Round-to-Hour timestamp (long).
> Value =3D The count
>
> We collect raw data, and updates the counters table for the matched group=
 by key, hour.
> We tried using Increment, but discovered its very very slow.
> Instead we've decided to update the counters upon compaction. We write th=
e deltas into the same row-key, but a longer column qualifier: <RoundedToTh=
eHourTS><Type><UniqueId>.
> <Type> is: Delta or Aggregate.
> Delta stands for a delta column qualifier we send from our client.
>
> in the preCompact, I create an InternalScanner which aggregates the delta=
 column qualifier values and generates a new key value with Type Aggregate:=
 <TS><A><UniqueID>
>
> The problem with this implementation that it consumes more memory.
>
> Now, I've tried avoiding the creation of the Aggregate type KV, by simply=
 re-using the 1st delta column qualifier: simply changing its value in the =
KeyValue.
> But from some reason, after a couple of minor / major compactions, I see =
data loss, when I count the values and compare them to the expected.
>
>
> The question is: is it "legal" to change a KV I received from the Interna=
lScanner before adding it the Result - i..e returning it from my own Intern=
alScanner?
>
>
>
>
>
>
> On Feb 12, 2013, at 8:44 AM, Anoop Sam John wrote:
>
>> Asaf,
>>          You have created a wrapper around the original InternalScanner =
instance created by the compaction flow?
>>
>>> Where do the KV generated during the compaction process queue up before=
 being written to the disk? Is this buffer configurable?
>> When I wrote the Region Observer my assumption was the the compaction pr=
ocess works in Streaming fashion, thus even if I decide to generate a KV pe=
r KV I see, it still shouldn't be a problem memory wise.
>>
>> There is no queuing. Your assumption is correct only. It is written to t=
he writer as and when. (Just like how memstore flush doing the HFile write)=
 As Lars said a look at your code can tell if some thing is going wrong.  D=
o you have blooms being used?
>>
>> -Anoop-
>> ________________________________________
>> From: Mesika, Asaf [asaf.mesika@gmail.com]
>> Sent: Tuesday, February 12, 2013 11:16 AM
>> To: user@hbase.apache.org
>> Subject: Custom preCompact RegionObserver crashes entire cluster on OOME=
: Heap Space
>>
>> Hi,
>>
>> I wrote a RegionObserver which does preCompact.
>> I activated in pre-production, and then entire cluster dropped dead: One=
 RegionServer after another crashed on OutOfMemoryException: Heap Space.
>>
>> My preCompact method generates a KeyValue per each set of Column Qualifi=
ers it sees.
>> When I remove the coprocessor and restart the cluster, cluster remains s=
table.
>> I have 8 RS, each has 4 GB Heap. There about 9 regions (from a specific =
table I'm working on) per Region Server.
>> Running HBase 0.94.3
>>
>> The crash occur when the major compaction fires up, apparently cluster w=
ide.
>>
>>
>> My question is this: Where do the KV generated during the compaction pro=
cess queue up before being written to the disk? Is this buffer configurable=
?
>> When I wrote the Region Observer my assumption was the the compaction pr=
ocess works in Streaming fashion, thus even if I decide to generate a KV pe=
r KV I see, it still shouldn't be a problem memory wise.
>>
>> Of course I'm trying to improve my code so it will generate much less ne=
w KV (by simply altering the existing KVs received from the InternalScanner=
).
>>
>> Thank you,
>>
>> Asaf=