Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2DEC5E0F5 for ; Tue, 12 Feb 2013 15:34:32 +0000 (UTC) Received: (qmail 52753 invoked by uid 500); 12 Feb 2013 15:34:30 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 52703 invoked by uid 500); 12 Feb 2013 15:34:29 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 52690 invoked by uid 99); 12 Feb 2013 15:34:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Feb 2013 15:34:29 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jtaylor@salesforce.com designates 64.18.3.34 as permitted sender) Received: from [64.18.3.34] (HELO exprod8og117.obsmtp.com) (64.18.3.34) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 12 Feb 2013 15:34:21 +0000 Received: from exsfm-hub4.internal.salesforce.com ([204.14.239.239]) by exprod8ob117.postini.com ([64.18.7.12]) with SMTP ID DSNKURpg6RXsS17Z+sLQIEI2b+DrS3xDMzc+@postini.com; Tue, 12 Feb 2013 07:34:01 PST Received: from EXSFM-MB03.internal.salesforce.com ([10.1.127.57]) by exsfm-hub4.internal.salesforce.com ([10.1.127.8]) with mapi; Tue, 12 Feb 2013 07:32:50 -0800 From: James Taylor To: "user@hbase.apache.org" Date: Tue, 12 Feb 2013 07:32:46 -0800 Subject: Re: Custom preCompact RegionObserver crashes entire cluster on OOME: Heap Space Thread-Topic: Custom preCompact RegionObserver crashes entire cluster on OOME: Heap Space Thread-Index: Ac4JNjbqVpdaJogdThWCGjQMWOx7sg== Message-ID: References: <51F68F1C-6C3A-4B29-A97C-C269387FC69E@gmail.com> <0CE69E9126D0344088798A3B7F7F80863AECC345@szxeml553-mbs.china.huawei.com> <98A8F664-6AFB-44EB-970D-71ABC8D2E34E@gmail.com> <0CE69E9126D0344088798A3B7F7F80863AECC596@szxeml553-mbs.china.huawei.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org IMO, I don't think it's safe to change the KV in-place. We always create a = new KV in our coprocessors. James On Feb 12, 2013, at 6:41 AM, "Mesika, Asaf" wrote: > I'm seeing a very strange behavior: >=20 > If I run a scan during major compaction, I can see both the modified Delt= a Key Value (which contains the aggregated values - e.g. 9) and the other t= wo delta columns that were used for this aggregated column (e.g, 3, 3) - as= if Scan is exposed to the key values produced in mid scan. > Could it be related to Cache somehow? >=20 > I am modifying the KeyValue object received from the InternalScanner in p= reCompact (modifying its value). >=20 > On Feb 12, 2013, at 11:22 AM, Anoop Sam John wrote: >=20 >>> The question is: is it "legal" to change a KV I received from the Inter= nalScanner before adding it the Result - i..e returning it from my own Inte= rnalScanner? >>=20 >> You can change as per your need IMO >>=20 >> -Anoop- >>=20 >> ________________________________________ >> From: Mesika, Asaf [asaf.mesika@gmail.com] >> Sent: Tuesday, February 12, 2013 2:43 PM >> To: user@hbase.apache.org >> Subject: Re: Custom preCompact RegionObserver crashes entire cluster on = OOME: Heap Space >>=20 >> I am trying to reduce the amount of KeyValue generated during the preCom= pact, but I'm getting some weird behaviors. >>=20 >> Let me describe what I am doing in short: >>=20 >> We have a counters table, with the following structure: >>=20 >> RowKey =3D A combination of field values representing group by key. >> CF =3D time span aggregate (Hour, Day, Month). Currently we have only fo= r Hour. >> CQ =3D Round-to-Hour timestamp (long). >> Value =3D The count >>=20 >> We collect raw data, and updates the counters table for the matched grou= p by key, hour. >> We tried using Increment, but discovered its very very slow. >> Instead we've decided to update the counters upon compaction. We write t= he deltas into the same row-key, but a longer column qualifier: . >> is: Delta or Aggregate. >> Delta stands for a delta column qualifier we send from our client. >>=20 >> in the preCompact, I create an InternalScanner which aggregates the delt= a column qualifier values and generates a new key value with Type Aggregate= : >>=20 >> The problem with this implementation that it consumes more memory. >>=20 >> Now, I've tried avoiding the creation of the Aggregate type KV, by simpl= y re-using the 1st delta column qualifier: simply changing its value in the= KeyValue. >> But from some reason, after a couple of minor / major compactions, I see= data loss, when I count the values and compare them to the expected. >>=20 >>=20 >> The question is: is it "legal" to change a KV I received from the Intern= alScanner before adding it the Result - i..e returning it from my own Inter= nalScanner? >>=20 >>=20 >>=20 >>=20 >>=20 >>=20 >> On Feb 12, 2013, at 8:44 AM, Anoop Sam John wrote: >>=20 >>> Asaf, >>> You have created a wrapper around the original InternalScanner = instance created by the compaction flow? >>>=20 >>>> Where do the KV generated during the compaction process queue up befor= e being written to the disk? Is this buffer configurable? >>> When I wrote the Region Observer my assumption was the the compaction p= rocess works in Streaming fashion, thus even if I decide to generate a KV p= er KV I see, it still shouldn't be a problem memory wise. >>>=20 >>> There is no queuing. Your assumption is correct only. It is written to = the writer as and when. (Just like how memstore flush doing the HFile write= ) As Lars said a look at your code can tell if some thing is going wrong. = Do you have blooms being used? >>>=20 >>> -Anoop- >>> ________________________________________ >>> From: Mesika, Asaf [asaf.mesika@gmail.com] >>> Sent: Tuesday, February 12, 2013 11:16 AM >>> To: user@hbase.apache.org >>> Subject: Custom preCompact RegionObserver crashes entire cluster on OOM= E: Heap Space >>>=20 >>> Hi, >>>=20 >>> I wrote a RegionObserver which does preCompact. >>> I activated in pre-production, and then entire cluster dropped dead: On= e RegionServer after another crashed on OutOfMemoryException: Heap Space. >>>=20 >>> My preCompact method generates a KeyValue per each set of Column Qualif= iers it sees. >>> When I remove the coprocessor and restart the cluster, cluster remains = stable. >>> I have 8 RS, each has 4 GB Heap. There about 9 regions (from a specific= table I'm working on) per Region Server. >>> Running HBase 0.94.3 >>>=20 >>> The crash occur when the major compaction fires up, apparently cluster = wide. >>>=20 >>>=20 >>> My question is this: Where do the KV generated during the compaction pr= ocess queue up before being written to the disk? Is this buffer configurabl= e? >>> When I wrote the Region Observer my assumption was the the compaction p= rocess works in Streaming fashion, thus even if I decide to generate a KV p= er KV I see, it still shouldn't be a problem memory wise. >>>=20 >>> Of course I'm trying to improve my code so it will generate much less n= ew KV (by simply altering the existing KVs received from the InternalScanne= r). >>>=20 >>> Thank you, >>>=20 >>> Asaf >=20