hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From libis <libistha...@gmail.com>
Subject Re: should we split the scan range into serveral segments when the scan range only located in a single region?
Date Tue, 05 Sep 2017 09:24:34 GMT
OK, I have watched the jira.

2017-09-05 15:22 GMT+08:00 Chia-Ping Tsai <chia7712@apache.org>:

> Yeah, 16894 is also a similar one. Maybe Yi Liang still work on this. Move
> this discussion to the jira.
>
> On 2017-09-05 09:53, libis <libisthanks@gmail.com> wrote:
> > Thanks for Mikhail. I am pleasure to pick HBASE-18090 up (my jira account
> > is xinxin fan). i notice that the issue HBASE-16894(
> > https://issues.apache.org/jira/browse/HBASE-16894) tries to work on the
> > similar thing. Chia-Ping, look it?
> >
> > 2017-09-04 20:41 GMT+08:00 Chia-Ping Tsai <chia7712@apache.org>:
> >
> > > Thanks for the information. Mikhail. It seems to me the issue is
> popular.
> > > libis, Could you take HBASE-18090 over? I can assign the issue to you
> if i
> > > get ur jira account.
> > >
> > > On 2017-09-04 20:26, Mikhail Antonov <olorinbant@gmail.com> wrote:
> > > > I've filed https://issues.apache.org/jira/browse/HBASE-18090 some
> time
> > > ago
> > > > and attached draft patch to it. It's not complete as we need some
> deeper
> > > > changes in the way we open regions (see comments) but basic stuff
> works
> > > (I
> > > > ended up going the other route and didn't have bandwidth to finish
> that -
> > > > would be great if someone picked it up)
> > > >
> > > > Mikhail
> > > >
> > > > On Mon, Sep 4, 2017 at 11:13 AM Chia-Ping Tsai <chia7712@apache.org>
> > > wrote:
> > > >
> > > > > That sounds good. There are some related issue. see
> > > > > https://issues.apache.org/jira/browse/HBASE-4914 and
> > > > > https://issues.apache.org/jira/browse/HBASE-4063.
> > > > >
> > > > > On 2017-09-04 15:06, libis <libisthanks@gmail.com> wrote:
> > > > > > Hi
> > > > > >
> > > > > > When TableInputFormat is used to source an HBase table in a
> MapReduce
> > > > > job,
> > > > > > its splitter will make a map task for each region of the table.
> > > However,
> > > > > in
> > > > > > some cases, the user’s scan range may locate in a single region,
> > > > > resulting
> > > > > > in there is  a only mapper. For example, the rowkey of the table
> is
> > > > > > ‘md5(userid) + timestamp’, once client want to scan the
data of a
> > > > > specified
> > > > > > user in the latest month with MR, it’s much possible that
there
> is
> > > only
> > > > > one
> > > > > > mapper working.
> > > > > >
> > > > > > In order to scan data in parallel if the user's scan range
> located
> > > in a
> > > > > > single region, should we split the scan range into serveral
> segments
> > > > > within
> > > > > > a region?
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > xinxin
> > > > > >
> > > > >
> > > > --
> > > > Thanks,
> > > > Michael Antonov
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message