hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ma, Ming" <min...@ebay.com>
Subject RE: Number of map jobs per region
Date Mon, 29 Aug 2011 04:58:46 GMT

You might find https://issues.apache.org/jira/browse/HBASE-4063 useful when it is ready. Of
course, you can always use your own customized version of TableInputFormat. https://issues.apache.org/jira/browse/HBASE-4039
allows you to provide your own TableInputFormat to TableMapReduceUtil.


-----Original Message-----
From: Dhaval Makawana [mailto:dhaval.makawana@gmail.com] 
Sent: Sunday, August 28, 2011 2:06 AM
To: user@hbase.apache.org
Subject: Number of map jobs per region


We have 31 regions for a table in our HBase system and hence while scanning
the table via TableMapper, it creates 31 maps. Following is the line from
documentation where I got the reason for the same.

"Reading from HBase, the TableInputFormat asks HBase for the list of regions
and makes a map-per-region or mapred.map.tasks maps, whichever is smaller "

Each region file size is almost 7 GB(lzo compressed  data) and map jobs are
taking huge time to processed the data. Is there any way to increase
parallelism(allocate more maps per region)?


View raw message