hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahab Yunus <shahab.yu...@gmail.com>
Subject Relationship between number of reducers and number of regions in the table
Date Thu, 14 Aug 2014 12:23:16 GMT
I couldn't decide that whether it is an HBase question or Hadoop/Yarn.

In the utility class for MR jobs integerated with HBase,
*org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil, *

in the method:


*public static void initTableReducerJob(String table,*
*    Class<? extends TableReducer> reducer, Job job,*
*    Class partitioner, String quorumAddress, String serverClass,*
*    String serverImpl, boolean addDependencyJars) throws IOException;*

Im the above method the following check is added, while setting the number
of reducers that:


*...*
*int regions = outputTable.getRegionsInfo().size();*
*...*
*if (job.getNumReduceTasks() > regions) {*
*        job.setNumReduceTasks(outputTable.getRegionsInfo().size());*
*      }*
*...      *

What is the reason for doing this? And what are the negative effects we
don't follow this? I can think one that, in case of more than one reducer
writing/reading a same region can cause hot-spotting and performance
issues. Are there any other reasons to add this check as well?

Thanks a lot.

Regards,
Shahab

Mime
View raw message