hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Antonov <olorinb...@gmail.com>
Subject Re: should we split the scan range into serveral segments when the scan range only located in a single region?
Date Mon, 04 Sep 2017 12:26:36 GMT
I've filed https://issues.apache.org/jira/browse/HBASE-18090 some time ago
and attached draft patch to it. It's not complete as we need some deeper
changes in the way we open regions (see comments) but basic stuff works (I
ended up going the other route and didn't have bandwidth to finish that -
would be great if someone picked it up)

Mikhail

On Mon, Sep 4, 2017 at 11:13 AM Chia-Ping Tsai <chia7712@apache.org> wrote:

> That sounds good. There are some related issue. see
> https://issues.apache.org/jira/browse/HBASE-4914 and
> https://issues.apache.org/jira/browse/HBASE-4063.
>
> On 2017-09-04 15:06, libis <libisthanks@gmail.com> wrote:
> > Hi
> >
> > When TableInputFormat is used to source an HBase table in a MapReduce
> job,
> > its splitter will make a map task for each region of the table. However,
> in
> > some cases, the user’s scan range may locate in a single region,
> resulting
> > in there is  a only mapper. For example, the rowkey of the table is
> > ‘md5(userid) + timestamp’, once client want to scan the data of a
> specified
> > user in the latest month with MR, it’s much possible that there is only
> one
> > mapper working.
> >
> > In order to scan data in parallel if the user's scan range located in a
> > single region, should we split the scan range into serveral segments
> within
> > a region?
> >
> > Best,
> >
> > xinxin
> >
>
-- 
Thanks,
Michael Antonov

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message