hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ma, Ming" <min...@ebay.com>
Subject RE: Number of map jobs per region
Date Mon, 29 Aug 2011 04:58:46 GMT
Dhaval,

You might find https://issues.apache.org/jira/browse/HBASE-4063 useful when it is ready. Of
course, you can always use your own customized version of TableInputFormat. https://issues.apache.org/jira/browse/HBASE-4039
allows you to provide your own TableInputFormat to TableMapReduceUtil.

Ming

-----Original Message-----
From: Dhaval Makawana [mailto:dhaval.makawana@gmail.com] 
Sent: Sunday, August 28, 2011 2:06 AM
To: user@hbase.apache.org
Subject: Number of map jobs per region

Hi,

We have 31 regions for a table in our HBase system and hence while scanning
the table via TableMapper, it creates 31 maps. Following is the line from
documentation where I got the reason for the same.

"Reading from HBase, the TableInputFormat asks HBase for the list of regions
and makes a map-per-region or mapred.map.tasks maps, whichever is smaller "
(
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html
)

Each region file size is almost 7 GB(lzo compressed  data) and map jobs are
taking huge time to processed the data. Is there any way to increase
parallelism(allocate more maps per region)?

Regards,
Dhaval

Mime
View raw message