Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EDCE611977 for ; Thu, 14 Aug 2014 12:23:55 +0000 (UTC) Received: (qmail 58623 invoked by uid 500); 14 Aug 2014 12:23:50 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 58500 invoked by uid 500); 14 Aug 2014 12:23:50 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 58463 invoked by uid 99); 14 Aug 2014 12:23:49 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Aug 2014 12:23:49 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of shahab.yunus@gmail.com designates 209.85.216.48 as permitted sender) Received: from [209.85.216.48] (HELO mail-qa0-f48.google.com) (209.85.216.48) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Aug 2014 12:23:18 +0000 Received: by mail-qa0-f48.google.com with SMTP id m5so872086qaj.7 for ; Thu, 14 Aug 2014 05:23:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=V9TIYPgwzaSo3Ak404RWdLHdVCLB2eKb7juP6GRjzU8=; b=qZVUp8rk1LbsTXz5ncjdYid8Mh4wNvMsu1Q3BsHfm+ugP24SQRN9UEgLiyC0rF/DWi 4FOFvOkIInpVPXn7/ysXVCDh5UVL8zzump5++K3yEsMtslmdkYn0PoiZozdATGKEtgFs ji3iGD9ir0tF0Dg/egm5b6LP+3pO+HG4CreCVdVMMr3eDUw8iKyBup01Y3rEl49+ZsN1 qxGc5qJkKCRSHX3X44CWbxdSK92oFNVOInazlzuPKTZ59vXNjpV1KROlskb6Gx2MVmqR gl7NnAVUEku5J6MJi3R/EEyORkEZ2H2Gxraen6afX4RLxwkLvfl1Dj1mquCzjLmwKuWJ hCNA== MIME-Version: 1.0 X-Received: by 10.140.43.245 with SMTP id e108mr7068036qga.76.1408018996906; Thu, 14 Aug 2014 05:23:16 -0700 (PDT) Received: by 10.229.246.198 with HTTP; Thu, 14 Aug 2014 05:23:16 -0700 (PDT) Date: Thu, 14 Aug 2014 08:23:16 -0400 Message-ID: Subject: Relationship between number of reducers and number of regions in the table From: Shahab Yunus To: "user@hbase.apache.org" , "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=001a113a666425d307050095f8d7 X-Virus-Checked: Checked by ClamAV on apache.org --001a113a666425d307050095f8d7 Content-Type: text/plain; charset=UTF-8 I couldn't decide that whether it is an HBase question or Hadoop/Yarn. In the utility class for MR jobs integerated with HBase, *org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil, * in the method: *public static void initTableReducerJob(String table,* * Class reducer, Job job,* * Class partitioner, String quorumAddress, String serverClass,* * String serverImpl, boolean addDependencyJars) throws IOException;* Im the above method the following check is added, while setting the number of reducers that: *...* *int regions = outputTable.getRegionsInfo().size();* *...* *if (job.getNumReduceTasks() > regions) {* * job.setNumReduceTasks(outputTable.getRegionsInfo().size());* * }* *... * What is the reason for doing this? And what are the negative effects we don't follow this? I can think one that, in case of more than one reducer writing/reading a same region can cause hot-spotting and performance issues. Are there any other reasons to add this check as well? Thanks a lot. Regards, Shahab --001a113a666425d307050095f8d7 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I couldn't decide that whether it is an HBase question= or Hadoop/Yarn.

In the utility class for MR jobs intege= rated with HBase, org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil,= =C2=A0

in the method:


=
public static void initTableReducerJob(String table,
= =C2=A0 =C2=A0 Class<? extends TableReducer> reducer, Job job,<= /div>
=C2=A0 =C2=A0 Class partitioner, String quorumAddress, String serverClas= s,
=C2=A0 =C2=A0 String serverImpl, boolean addDependencyJ= ars) throws IOException;

Im the above method t= he following check is added, while setting the number of reducers that:


...
int regions =3D = outputTable.getRegionsInfo().size();
...
if (job.getNumReduceTasks() > regions) {
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 job.setNumReduceTasks(outputTable.getRegionsInfo().size()= );
=C2=A0 =C2=A0 =C2=A0 }
... =C2=A0 =C2=A0 =C2=A0
=C2=A0 =C2=A0 =C2=A0=C2=A0
What is the reason for doi= ng this? And what are the negative effects we don't follow this? I can = think one that, in case of more than one reducer writing/reading a same reg= ion can cause hot-spotting and performance issues. Are there any other reas= ons to add this check as well?

Thanks a lot.

Regards,
Shahab



--001a113a666425d307050095f8d7--