hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From libis <libistha...@gmail.com>
Subject Re: should we split the scan range into serveral segments when the scan range only located in a single region?
Date Mon, 04 Sep 2017 12:25:39 GMT
Thanks for replying promptly. oh, i think it maybe hard to set a proper
mapper number per region for a hbase user, and in that way, some small
region may create so much small jobs. however, we can simply specify a
fixed mapper number only if the scan range located in a single region which
maybe a common production scene for the large  region(>30g). what do you
think?

2017-09-04 17:13 GMT+08:00 Chia-Ping Tsai <chia7712@apache.org>:

> That sounds good. There are some related issue. see
> https://issues.apache.org/jira/browse/HBASE-4914 and
> https://issues.apache.org/jira/browse/HBASE-4063.
>
> On 2017-09-04 15:06, libis <libisthanks@gmail.com> wrote:
> > Hi
> >
> > When TableInputFormat is used to source an HBase table in a MapReduce
> job,
> > its splitter will make a map task for each region of the table. However,
> in
> > some cases, the user’s scan range may locate in a single region,
> resulting
> > in there is  a only mapper. For example, the rowkey of the table is
> > ‘md5(userid) + timestamp’, once client want to scan the data of a
> specified
> > user in the latest month with MR, it’s much possible that there is only
> one
> > mapper working.
> >
> > In order to scan data in parallel if the user's scan range located in a
> > single region, should we split the scan range into serveral segments
> within
> > a region?
> >
> > Best,
> >
> > xinxin
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message