hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Połaczański <dpolaczan...@gmail.com>
Subject Re: processing in coprocessor and region splitting
Date Sat, 26 Mar 2016 15:43:09 GMT
unfortunately no

Regards

2016-03-25 21:13 GMT+01:00 Ted Yu <yuzhihong@gmail.com>:

> bq. calculating another new attributes of a trade
>
> Can you put the new attributes in separate columns ?
>
> Cheers
>
> On Fri, Mar 25, 2016 at 12:38 PM, Daniel Połaczański <
> dpolaczanski@gmail.com
> > wrote:
>
> > The data is set of trades and the processing is some kind of enrichment
> > (calculating another new attributes of a trade). All attributes are
> needed
> > (the original and new)
> >
> > 2016-03-25 18:41 GMT+01:00 Ted Yu <yuzhihong@gmail.com>:
> >
> > > bq. During the processing the size of the data is doubled.
> > >
> > > This explains the frequent split :-)
> > >
> > > Is the original data needed after post-processing (maybe for auditing)
> ?
> > >
> > > Cheers
> > >
> > > On Fri, Mar 25, 2016 at 10:32 AM, Daniel Połaczański <
> > > dpolaczanski@gmail.com
> > > > wrote:
> > >
> > > > I am testing different solutions (POC).
> > > > The region size currenlty is 32MB (I know it should be >= 1GB, but
we
> > are
> > > > testing different solutions with smaller amount of the data ). So
> > > > increasing region size is not a solution. Our problems can happen
> even
> > > when
> > > > a region will be 1 GB. We want to proces the data with coprocessor
> and
> > > > hadoop map reduce. I can not have one big Region because I want
> > sensible
> > > > degree of paralerism (with Map Reduce and coprocessors).
> > > >
> > > > Increasing region size + pre-splitting  is not an option as well
> > because
> > > I
> > > > know nothing about keys(random long).
> > > >
> > > > During the processing the size of the data is doubled.
> > > >
> > > > And yes, coprocessor rewrites a lot of the data written into the
> table.
> > > The
> > > > whole record is serialized to avro and stored in one column (storing
> > > single
> > > > attribute in single column we will try in the next POC)
> > > >
> > > > it is not a typical big data project where we can allow former
> analysis
> > > of
> > > > the data:)
> > > >
> > > > 2016-03-25 17:38 GMT+01:00 Ted Yu <yuzhihong@gmail.com>:
> > > >
> > > > > What's the current region size you use ?
> > > > >
> > > > > bq. During the processing size of the data gets increased
> > > > >
> > > > > Can you give us some quantitative measure as to how much increase
> you
> > > > > observed (w.r.t. region size) ?
> > > > >
> > > > > bq. I was looking for some "global lock" in source code
> > > > >
> > > > > Probably not a good idea using global lock.
> > > > >
> > > > > I am curious, looks like your coprocesser may rewrite a lot of data
> > > > written
> > > > > into the table.
> > > > > Can client side accommodate such logic so that the rewrite is
> > reduced ?
> > > > >
> > > > > Thanks
> > > > >
> > > > > On Fri, Mar 25, 2016 at 8:55 AM, Daniel Połaczański <
> > > > > dpolaczanski@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > > I have some processing in my coprocesserService which modifies
> the
> > > > > existing
> > > > > > data in place. It iterates over every row, modifies and puts
it
> > back
> > > to
> > > > > > region. The table can be modified by only one client.
> > > > > >
> > > > > > During the processing size of the data gets increased ->
region's
> > > size
> > > > > get
> > > > > > increased -> region's split happens. It makes that the processing
> > is
> > > > > > stopped by exception NotServingRegionException (because region
is
> > > > closed
> > > > > > and splited to two new regions so it is closed and doesn't exist
> > > > > anymore).
> > > > > >
> > > > > > Is there any clean way to block Region's splitting?
> > > > > >
> > > > > > I was looking for some "global lock" in source code but I haven't
> > > found
> > > > > > anything helpfull.
> > > > > > Another idea is to create custom RegionSplitPolicy and explicilty
> > set
> > > > > some
> > > > > > Flag which will return false in shouldSplit(), but I'm not sure
> yet
> > > if
> > > > it
> > > > > > is safe.
> > > > > > Could you advise?
> > > > > > Regards
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message