hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Połaczański <dpolaczan...@gmail.com>
Subject Re: processing in coprocessor and region splitting
Date Fri, 25 Mar 2016 17:32:23 GMT
I am testing different solutions (POC).
The region size currenlty is 32MB (I know it should be >= 1GB, but we are
testing different solutions with smaller amount of the data ). So
increasing region size is not a solution. Our problems can happen even when
a region will be 1 GB. We want to proces the data with coprocessor and
hadoop map reduce. I can not have one big Region because I want sensible
degree of paralerism (with Map Reduce and coprocessors).

Increasing region size + pre-splitting  is not an option as well because I
know nothing about keys(random long).

During the processing the size of the data is doubled.

And yes, coprocessor rewrites a lot of the data written into the table. The
whole record is serialized to avro and stored in one column (storing single
attribute in single column we will try in the next POC)

it is not a typical big data project where we can allow former analysis of
the data:)

2016-03-25 17:38 GMT+01:00 Ted Yu <yuzhihong@gmail.com>:

> What's the current region size you use ?
>
> bq. During the processing size of the data gets increased
>
> Can you give us some quantitative measure as to how much increase you
> observed (w.r.t. region size) ?
>
> bq. I was looking for some "global lock" in source code
>
> Probably not a good idea using global lock.
>
> I am curious, looks like your coprocesser may rewrite a lot of data written
> into the table.
> Can client side accommodate such logic so that the rewrite is reduced ?
>
> Thanks
>
> On Fri, Mar 25, 2016 at 8:55 AM, Daniel Połaczański <
> dpolaczanski@gmail.com>
> wrote:
>
> > Hi,
> > I have some processing in my coprocesserService which modifies the
> existing
> > data in place. It iterates over every row, modifies and puts it back to
> > region. The table can be modified by only one client.
> >
> > During the processing size of the data gets increased -> region's size
> get
> > increased -> region's split happens. It makes that the processing is
> > stopped by exception NotServingRegionException (because region is closed
> > and splited to two new regions so it is closed and doesn't exist
> anymore).
> >
> > Is there any clean way to block Region's splitting?
> >
> > I was looking for some "global lock" in source code but I haven't found
> > anything helpfull.
> > Another idea is to create custom RegionSplitPolicy and explicilty set
> some
> > Flag which will return false in shouldSplit(), but I'm not sure yet if it
> > is safe.
> > Could you advise?
> > Regards
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message