hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From libis <libistha...@gmail.com>
Subject should we split the scan range into serveral segments when the scan range only located in a single region?
Date Mon, 04 Sep 2017 07:06:06 GMT
Hi

When TableInputFormat is used to source an HBase table in a MapReduce job,
its splitter will make a map task for each region of the table. However, in
some cases, the user’s scan range may locate in a single region, resulting
in there is  a only mapper. For example, the rowkey of the table is
‘md5(userid) + timestamp’, once client want to scan the data of a specified
user in the latest month with MR, it’s much possible that there is only one
mapper working.

In order to scan data in parallel if the user's scan range located in a
single region, should we split the scan range into serveral segments within
a region?

Best,

xinxin

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message