hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Custom preCompact RegionObserver crashes entire cluster on OOME: Heap Space
Date Tue, 12 Feb 2013 16:00:08 GMT
Asaf:
In another email thread, discussion w.r.t. aggregation was around using
prePut coprocessor hook.

See if that approach is applicable in your case.

On Tue, Feb 12, 2013 at 7:32 AM, James Taylor <jtaylor@salesforce.com>wrote:

> IMO, I don't think it's safe to change the KV in-place. We always create a
> new KV in our coprocessors.
>
> James
>
> On Feb 12, 2013, at 6:41 AM, "Mesika, Asaf" <asaf.mesika@gmail.com> wrote:
>
> > I'm seeing a very strange behavior:
> >
> > If I run a scan during major compaction, I can see both the modified
> Delta Key Value (which contains the aggregated values - e.g. 9) and the
> other two delta columns that were used for this aggregated column (e.g, 3,
> 3) - as if Scan is exposed to the key values produced in mid scan.
> > Could it be related to Cache somehow?
> >
> > I am modifying the KeyValue object received from the InternalScanner in
> preCompact (modifying its value).
> >
> > On Feb 12, 2013, at 11:22 AM, Anoop Sam John wrote:
> >
> >>> The question is: is it "legal" to change a KV I received from the
> InternalScanner before adding it the Result - i..e returning it from my own
> InternalScanner?
> >>
> >> You can change as per your need IMO
> >>
> >> -Anoop-
> >>
> >> ________________________________________
> >> From: Mesika, Asaf [asaf.mesika@gmail.com]
> >> Sent: Tuesday, February 12, 2013 2:43 PM
> >> To: user@hbase.apache.org
> >> Subject: Re: Custom preCompact RegionObserver crashes entire cluster on
> OOME: Heap Space
> >>
> >> I am trying to reduce the amount of KeyValue generated during the
> preCompact, but I'm getting some weird behaviors.
> >>
> >> Let me describe what I am doing in short:
> >>
> >> We have a counters table, with the following structure:
> >>
> >> RowKey =  A combination of field values representing group by key.
> >> CF = time span aggregate (Hour, Day, Month). Currently we have only for
> Hour.
> >> CQ = Round-to-Hour timestamp (long).
> >> Value = The count
> >>
> >> We collect raw data, and updates the counters table for the matched
> group by key, hour.
> >> We tried using Increment, but discovered its very very slow.
> >> Instead we've decided to update the counters upon compaction. We write
> the deltas into the same row-key, but a longer column qualifier:
> <RoundedToTheHourTS><Type><UniqueId>.
> >> <Type> is: Delta or Aggregate.
> >> Delta stands for a delta column qualifier we send from our client.
> >>
> >> in the preCompact, I create an InternalScanner which aggregates the
> delta column qualifier values and generates a new key value with Type
> Aggregate: <TS><A><UniqueID>
> >>
> >> The problem with this implementation that it consumes more memory.
> >>
> >> Now, I've tried avoiding the creation of the Aggregate type KV, by
> simply re-using the 1st delta column qualifier: simply changing its value
> in the KeyValue.
> >> But from some reason, after a couple of minor / major compactions, I
> see data loss, when I count the values and compare them to the expected.
> >>
> >>
> >> The question is: is it "legal" to change a KV I received from the
> InternalScanner before adding it the Result - i..e returning it from my own
> InternalScanner?
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Feb 12, 2013, at 8:44 AM, Anoop Sam John wrote:
> >>
> >>> Asaf,
> >>>         You have created a wrapper around the original InternalScanner
> instance created by the compaction flow?
> >>>
> >>>> Where do the KV generated during the compaction process queue up
> before being written to the disk? Is this buffer configurable?
> >>> When I wrote the Region Observer my assumption was the the compaction
> process works in Streaming fashion, thus even if I decide to generate a KV
> per KV I see, it still shouldn't be a problem memory wise.
> >>>
> >>> There is no queuing. Your assumption is correct only. It is written to
> the writer as and when. (Just like how memstore flush doing the HFile
> write) As Lars said a look at your code can tell if some thing is going
> wrong.  Do you have blooms being used?
> >>>
> >>> -Anoop-
> >>> ________________________________________
> >>> From: Mesika, Asaf [asaf.mesika@gmail.com]
> >>> Sent: Tuesday, February 12, 2013 11:16 AM
> >>> To: user@hbase.apache.org
> >>> Subject: Custom preCompact RegionObserver crashes entire cluster on
> OOME: Heap Space
> >>>
> >>> Hi,
> >>>
> >>> I wrote a RegionObserver which does preCompact.
> >>> I activated in pre-production, and then entire cluster dropped dead:
> One RegionServer after another crashed on OutOfMemoryException: Heap Space.
> >>>
> >>> My preCompact method generates a KeyValue per each set of Column
> Qualifiers it sees.
> >>> When I remove the coprocessor and restart the cluster, cluster remains
> stable.
> >>> I have 8 RS, each has 4 GB Heap. There about 9 regions (from a
> specific table I'm working on) per Region Server.
> >>> Running HBase 0.94.3
> >>>
> >>> The crash occur when the major compaction fires up, apparently cluster
> wide.
> >>>
> >>>
> >>> My question is this: Where do the KV generated during the compaction
> process queue up before being written to the disk? Is this buffer
> configurable?
> >>> When I wrote the Region Observer my assumption was the the compaction
> process works in Streaming fashion, thus even if I decide to generate a KV
> per KV I see, it still shouldn't be a problem memory wise.
> >>>
> >>> Of course I'm trying to improve my code so it will generate much less
> new KV (by simply altering the existing KVs received from the
> InternalScanner).
> >>>
> >>> Thank you,
> >>>
> >>> Asaf
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message