Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of john.lilley@redpoint.net
 designates 206.225.164.222 as permitted sender)
From: John Lilley <john.lilley@redpoint.net>
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Subject: RE: How to best decide mapper output/reducer input for a huge
 string?
Thread-Topic: How to best decide mapper output/reducer input for a huge
 string?
Thread-Index: AQHOtpSOclquHX0jlUux7odm+wu9DpnQN6KAgAACFQCAAAj2AIAAC4UAgACCtPA=
Date: Sat, 21 Sep 2013 23:08:18 +0000
Message-ID: 
 <869970D71E26D7498BDAC4E1CA92226B86D38B39@MBX021-E3-NJ-2.exch021.domain.local>
References: 
 <CAPvS-K-RJFms7c_WjfLbQ79RucBzrSC6upekgYedUiVdKWrw+Q@mail.gmail.com>
 <CACG-F-t2wF87577pfmK5iy+eau8uhW8+dvvvkhhGn8KNn+vRHA@mail.gmail.com>
 <CAPvS-K8d2kBnJiciVA3cja29UFvUSQ-cUhjS=ei=H8h5YGUN1Q@mail.gmail.com>
 <CACG-F-uGjcRTBKOM+c=BaspkcGAnEry8gWLDL9XgeHR9FxDiNQ@mail.gmail.com>
 <CAPvS-K-4v0_nE_S6Xob5tHXfc6gn-bXTa5BxNvOgQGZooAa_8A@mail.gmail.com>
In-Reply-To: 
 <CAPvS-K-4v0_nE_S6Xob5tHXfc6gn-bXTa5BxNvOgQGZooAa_8A@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
Content-Type: multipart/alternative;
	boundary="_000_869970D71E26D7498BDAC4E1CA92226B86D38B39MBX021E3NJ2exch_"
MIME-Version: 1.0

--_000_869970D71E26D7498BDAC4E1CA92226B86D38B39MBX021E3NJ2exch_
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Pavan,
How large are the rows in HBase?  22 million rows is not very much but you =
mentioned "huge strings".  Can you tell which part of the processing is the=
 limiting factor (read from HBase, mapper output, reducers)?
John


From: Pavan Sudheendra [mailto:pavan0591@gmail.com]
Sent: Saturday, September 21, 2013 2:17 AM
To: user@hadoop.apache.org
Subject: Re: How to best decide mapper output/reducer input for a huge stri=
ng?

No, I don't have a combiner in place. Is it necessary? How do I make my map=
 output compressed? Yes, the Tables in HBase are compressed.
Although, there's no real bottleneck, the time it takes to process the enti=
re table is huge. I have to constantly check if i can optimize it somehow..
Oh okay.. I'll implement a Custom Writable.. Apart from that, do you see an=
y thing wrong with my design? Does it require any kind of re-work? Thank yo=
u so much for helping..

On Sat, Sep 21, 2013 at 1:06 PM, Pradeep Gollakota <pradeepg26@gmail.com<ma=
ilto:pradeepg26@gmail.com>> wrote:
One thing that comes to mind is that your keys are Strings which are highly=
 inefficient. You might get a lot better performance if you write a custom =
writable for your Key object using the appropriate data types. For example,=
 use a long (LongWritable) for timestamps. This should make (de)serializati=
on a lot faster. If HouseHoldId is an integer, your speed of comparisons fo=
r sorting will also go up.

Ensure that your map output's are being compressed. Are your tables in HBas=
e compressed? Do you have a combiner?

Have you been able to profile your code to see where the bottlenecks are?

On Sat, Sep 21, 2013 at 12:04 AM, Pavan Sudheendra <pavan0591@gmail.com<mai=
lto:pavan0591@gmail.com>> wrote:
Hi Pradeep,
Yes.. Basically i'm only writing the key part as the map output.. The V of =
<K,V> is not of much use to me.. But i'm hoping to change that if it leads =
to faster execution.. I'm kind of a newbie so looking to make the map/reduc=
e job run a lot faster..
Also, yes. It gets sorted by the HouseHoldID which is what i needed.. But s=
eems if i write a map output for each and every row of a 19 m row HBase tab=
le, its taking nearly a day to complete.. (21 mappers and 21 reducers)

I have looked at both Pig/Hive to do the job but i'm supposed to do this vi=
a a MR job.. So, cannot use either of that.. Do you recommend me to try som=
ething if i have the data in that format?

On Sat, Sep 21, 2013 at 12:26 PM, Pradeep Gollakota <pradeepg26@gmail.com<m=
ailto:pradeepg26@gmail.com>> wrote:
I'm sorry but I don't understand your question. Is the output of the mapper=
 you're describing the key portion? If it is the key, then your data should=
 already be sorted by HouseHoldId since it occurs first in your key.

The SortComparator will tell Hadoop how to sort your data. So you use this =
if you have a need for a non lexical sort order. The GroupingComparator wil=
l tell Hadoop how to group your data for the reducer. All KV-pairs from the=
 same group will be given to the same Reducer.

If your reduce computation needs all the KV-pairs for the same HouseHoldId,=
 then you will need to write a GroupingComparator.

Also, have you considered using a higher level abstraction on Hadoop such a=
s Pig, Hive, Cascading, etc.? The sorting/grouping type of tasks are a LOT =
easier to write in these languages.

Hope this helps!
- Pradeep

On Fri, Sep 20, 2013 at 11:32 PM, Pavan Sudheendra <pavan0591@gmail.com<mai=
lto:pavan0591@gmail.com>> wrote:

I need to improve my MR jobs which uses HBase as source as well as sink..

Basically, i'm reading data from 3 HBase Tables in the mapper, writing them=
 out as one huge string for the reducer to do some computation and dump int=
o a HBase Table..

Table1 ~ 19 million rows.

Table2 ~ 2 million rows.

Table3 ~ 900,000 rows.

The output of the mapper is something like this :

HouseHoldId contentID name duration genre type channelId personId televisio=
nID timestamp

I'm interested in sorting it on the basis of the HouseHoldID value so i'm u=
sing this technique. I'm not interested in the V part of pair so i'm kind o=
f ignoring it. My mapper class is defined as follows:

public static class AnalyzeMapper extends TableMapper<Text, IntWritable> { =
}

For my MR job to be completed, it takes 22 hours to complete which is not d=
esirable at all. I'm supposed to optimize this somehow to run a lot faster =
somehow..

scan.setCaching(750);

scan.setCacheBlocks(false);

TableMapReduceUtil.initTableMapperJob (

                                       Table1,           // input HBase tab=
le name

                                       scan,

                                       AnalyzeMapper.class,    // mapper

                                       Text.class,             // mapper ou=
tput key

                                       IntWritable.class,      // mapper ou=
tput value

                                       job);


                TableMapReduceUtil.initTableReducerJob(

                                        OutputTable,                // outp=
ut table

                                        AnalyzeReducerTable.class,  // redu=
cer class

                                        job);

                job.setNumReduceTasks(RegionCount);

My HBase Table1 has 21 regions so 21 mappers are spawned. We are running a =
8 node cloudera cluster.

Should i use a custom SortComparator or a Group Comparator?


--
Regards-
Pavan


--
Regards-
Pavan


--
Regards-
Pavan

--_000_869970D71E26D7498BDAC4E1CA92226B86D38B39MBX021E3NJ2exch_
Content-Type: text/html; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<html xmlns:v=3D"urn:schemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-micr=
osoft-com:office:office" xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" xmlns=3D"http:=
//www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dus-ascii"=
>
<meta name=3D"Generator" content=3D"Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
	{font-family:Tahoma;
	panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
	{font-family:Consolas;
	panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:blue;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{mso-style-priority:99;
	color:purple;
	text-decoration:underline;}
p
	{mso-style-priority:99;
	mso-margin-top-alt:auto;
	margin-right:0in;
	mso-margin-bottom-alt:auto;
	margin-left:0in;
	font-size:12.0pt;
	font-family:"Times New Roman","serif";}
code
	{mso-style-priority:99;
	font-family:"Courier New";}
pre
	{mso-style-priority:99;
	mso-style-link:"HTML Preformatted Char";
	margin:0in;
	margin-bottom:.0001pt;
	font-size:10.0pt;
	font-family:"Courier New";}
span.HTMLPreformattedChar
	{mso-style-name:"HTML Preformatted Char";
	mso-style-priority:99;
	mso-style-link:"HTML Preformatted";
	font-family:Consolas;}
span.EmailStyle21
	{mso-style-type:personal-reply;
	font-family:"Calibri","sans-serif";
	color:#1F497D;}
.MsoChpDefault
	{mso-style-type:export-only;
	font-family:"Calibri","sans-serif";}
@page WordSection1
	{size:8.5in 11.0in;
	margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
	{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext=3D"edit">
<o:idmap v:ext=3D"edit" data=3D"1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=3D"EN-US" link=3D"blue" vlink=3D"purple">
<div class=3D"WordSection1">
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D">Pavan,<o:p></o:p></span><=
/p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D">How large are the rows in=
 HBase?&nbsp; 22 million rows is not very much but you mentioned &#8220;hug=
e strings&#8221;.&nbsp; Can you tell which part of the processing is the li=
miting
 factor (read from HBase, mapper output, reducers)?<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D">John<o:p></o:p></span></p=
>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span><=
/p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span><=
/p>
<p class=3D"MsoNormal"><b><span style=3D"font-size:10.0pt;font-family:&quot=
;Tahoma&quot;,&quot;sans-serif&quot;">From:</span></b><span style=3D"font-s=
ize:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;"> Pavan Su=
dheendra [mailto:pavan0591@gmail.com]
<br>
<b>Sent:</b> Saturday, September 21, 2013 2:17 AM<br>
<b>To:</b> user@hadoop.apache.org<br>
<b>Subject:</b> Re: How to best decide mapper output/reducer input for a hu=
ge string?<o:p></o:p></span></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<div>
<div>
<div>
<p class=3D"MsoNormal" style=3D"margin-bottom:12.0pt">No, I don't have a co=
mbiner in place. Is it necessary? How do I make my map output compressed? Y=
es, the Tables in HBase are compressed.<o:p></o:p></p>
</div>
<p class=3D"MsoNormal" style=3D"margin-bottom:12.0pt">Although, there's no =
real bottleneck, the time it takes to process the entire table is huge. I h=
ave to constantly check if i can optimize it somehow..
<o:p></o:p></p>
</div>
<p class=3D"MsoNormal">Oh okay.. I'll implement a Custom Writable.. Apart f=
rom that, do you see any thing wrong with my design? Does it require any ki=
nd of re-work? Thank you so much for helping..<o:p></o:p></p>
</div>
<div>
<p class=3D"MsoNormal" style=3D"margin-bottom:12.0pt"><o:p>&nbsp;</o:p></p>
<div>
<p class=3D"MsoNormal">On Sat, Sep 21, 2013 at 1:06 PM, Pradeep Gollakota &=
lt;<a href=3D"mailto:pradeepg26@gmail.com" target=3D"_blank">pradeepg26@gma=
il.com</a>&gt; wrote:<o:p></o:p></p>
<div>
<p class=3D"MsoNormal">One thing that comes to mind is that your keys are S=
trings which are highly inefficient. You might get a lot better performance=
 if you write a custom writable for your Key object using the appropriate d=
ata types. For example, use a long
 (LongWritable) for timestamps. This should make (de)serialization a lot fa=
ster. If HouseHoldId is an integer, your speed of comparisons for sorting w=
ill also go up.<o:p></o:p></p>
<div>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
<div>
<p class=3D"MsoNormal">Ensure that your map output's are being compressed. =
Are your tables in HBase compressed? Do you have a combiner?<o:p></o:p></p>
</div>
<div>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
<div>
<p class=3D"MsoNormal">Have you been able to profile your code to see where=
 the bottlenecks are?<o:p></o:p></p>
</div>
</div>
<div>
<div>
<div>
<p class=3D"MsoNormal" style=3D"margin-bottom:12.0pt"><o:p>&nbsp;</o:p></p>
<div>
<p class=3D"MsoNormal">On Sat, Sep 21, 2013 at 12:04 AM, Pavan Sudheendra &=
lt;<a href=3D"mailto:pavan0591@gmail.com" target=3D"_blank">pavan0591@gmail=
.com</a>&gt; wrote:<o:p></o:p></p>
<div>
<div>
<div>
<p class=3D"MsoNormal">Hi Pradeep,<o:p></o:p></p>
</div>
<p class=3D"MsoNormal" style=3D"margin-bottom:12.0pt">Yes.. Basically i'm o=
nly writing the key part as the map output.. The V of &lt;K,V&gt; is not of=
 much use to me.. But i'm hoping to change that if it leads to faster execu=
tion.. I'm kind of a newbie so looking to
 make the map/reduce job run a lot faster.. <o:p></o:p></p>
</div>
<div>
<p class=3D"MsoNormal">Also, yes. It gets sorted by the HouseHoldID which i=
s what i needed.. But seems if i write a map output for each and every row =
of a 19 m row HBase table, its taking nearly a day to complete.. (21 mapper=
s and 21 reducers)<o:p></o:p></p>
</div>
<div>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
<div>
<p class=3D"MsoNormal">I have looked at both Pig/Hive to do the job but i'm=
 supposed to do this via a MR job.. So, cannot use either of that.. Do you =
recommend me to try something if i have the data in that format?<o:p></o:p>=
</p>
</div>
</div>
<div>
<div>
<div>
<p class=3D"MsoNormal" style=3D"margin-bottom:12.0pt"><o:p>&nbsp;</o:p></p>
<div>
<p class=3D"MsoNormal">On Sat, Sep 21, 2013 at 12:26 PM, Pradeep Gollakota =
&lt;<a href=3D"mailto:pradeepg26@gmail.com" target=3D"_blank">pradeepg26@gm=
ail.com</a>&gt; wrote:<o:p></o:p></p>
<div>
<p class=3D"MsoNormal">I'm sorry but I don't understand your question. Is t=
he output of the mapper you're describing the key portion? If it is the key=
, then your data should already be sorted by HouseHoldId since it occurs fi=
rst in your key.<o:p></o:p></p>
<div>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
<div>
<p class=3D"MsoNormal">The SortComparator will tell Hadoop how to sort your=
 data. So you use this if you have a need for a non lexical sort order. The=
 GroupingComparator will tell Hadoop how to group your data for the reducer=
. All KV-pairs from the same group
 will be given to the same Reducer.<o:p></o:p></p>
</div>
<div>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
<div>
<p class=3D"MsoNormal">If your reduce computation needs all the KV-pairs fo=
r the same HouseHoldId, then you will need to write a GroupingComparator.<o=
:p></o:p></p>
</div>
<div>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
<div>
<p class=3D"MsoNormal">Also, have you considered using a higher level abstr=
action on Hadoop such as Pig, Hive, Cascading, etc.? The sorting/grouping t=
ype of tasks are a LOT easier to write in these languages.<o:p></o:p></p>
</div>
<div>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
<div>
<p class=3D"MsoNormal">Hope this helps!<o:p></o:p></p>
</div>
<div>
<p class=3D"MsoNormal"><span style=3D"color:#888888">- Pradeep<o:p></o:p></=
span></p>
</div>
</div>
<div>
<div>
<div>
<p class=3D"MsoNormal" style=3D"margin-bottom:12.0pt"><o:p>&nbsp;</o:p></p>
<div>
<p class=3D"MsoNormal">On Fri, Sep 20, 2013 at 11:32 PM, Pavan Sudheendra &=
lt;<a href=3D"mailto:pavan0591@gmail.com" target=3D"_blank">pavan0591@gmail=
.com</a>&gt; wrote:<o:p></o:p></p>
<div>
<div>
<p>I need to improve my MR jobs which uses HBase as source as well as sink.=
. <o:p>
</o:p></p>
<p>Basically, i'm reading data from 3 HBase Tables in the mapper, writing t=
hem out as one huge string for the reducer to do some computation and dump =
into a HBase Table..
<o:p></o:p></p>
<pre><code>Table1 ~ 19 million rows.<o:p></o:p></code></pre>
<pre><code>Table2 ~ 2 million rows.<o:p></o:p></code></pre>
<pre><code>Table3 ~ 900,000 rows.</code><o:p></o:p></pre>
<p>The output of the mapper is something like this : <o:p></o:p></p>
<pre><code>HouseHoldId contentID name duration genre type channelId personI=
d televisionID timestamp</code><o:p></o:p></pre>
<p>I'm interested in sorting it on the basis of the HouseHoldID value so i'=
m using this technique. I'm not interested in the V part of pair so i'm kin=
d of ignoring it. My mapper class is defined as follows:<o:p></o:p></p>
<pre><code>public static class AnalyzeMapper extends TableMapper&lt;Text, I=
ntWritable&gt; { }</code><o:p></o:p></pre>
<p>For my MR job to be completed, it takes 22 hours to complete which is no=
t desirable at all. I'm supposed to optimize this somehow to run a lot fast=
er somehow..<o:p></o:p></p>
<pre><code>scan.setCaching(750);&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
<o:p></o:p></code></pre>
<pre><code>scan.setCacheBlocks(false); <o:p></o:p></code></pre>
<pre><code>TableMapReduceUtil.initTableMapperJob (<o:p></o:p></code></pre>
<pre><code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp; Table1,&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp; // input HBase table name<o:p></o:p></code></pre>
<pre><code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp; scan,&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <o:p></o:p></code></p=
re>
<pre><code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;AnalyzeMapper.class,&nbsp;&nbsp;&nbsp; // mapper<o:p></=
o:p></code></pre>
<pre><code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp; Text.class,&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp; // mapper output key<o:p></o:p></code></pre>
<pre><code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;IntWritable.class,&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // mapper o=
utput value<o:p></o:p></code></pre>
<pre><code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp; job);<o:p></o:p></code></pre>
<pre><code><o:p>&nbsp;</o:p></code></pre>
<pre><code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp; TableMapReduceUtil.initTableReducerJob(<o:p></o:=
p></code></pre>
<pre><code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp; OutputTable,&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // output table<o:p></o:p>=
</code></pre>
<pre><code>&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;AnalyzeReducerTable.class,&nbsp; // reducer class<o:p>=
</o:p></code></pre>
<pre><code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp; job);<o:p></o:p></code></pre>
<pre><code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp; job.setNumReduceTasks(RegionCount);&nbsp; </code=
><o:p></o:p></pre>
<p>My HBase Table1 has 21 regions so 21 mappers are spawned. We are running=
 a 8 node cloudera cluster.<o:p></o:p></p>
<p>Should i use a custom SortComparator or a Group Comparator? <o:p></o:p><=
/p>
</div>
<p class=3D"MsoNormal"><span style=3D"color:#888888"><br clear=3D"all">
<o:p></o:p></span></p>
<div>
<p class=3D"MsoNormal"><span style=3D"color:#888888"><br>
-- <br>
Regards-<o:p></o:p></span></p>
<div>
<p class=3D"MsoNormal"><span style=3D"color:#888888">Pavan<o:p></o:p></span=
></p>
</div>
</div>
</div>
</div>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
</div>
</div>
</div>
<p class=3D"MsoNormal"><br>
<br clear=3D"all">
<o:p></o:p></p>
</div>
</div>
<p class=3D"MsoNormal"><span style=3D"color:#888888">-- <br>
Regards-<o:p></o:p></span></p>
<div>
<p class=3D"MsoNormal"><span style=3D"color:#888888">Pavan<o:p></o:p></span=
></p>
</div>
</div>
</div>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
</div>
</div>
</div>
<p class=3D"MsoNormal"><br>
<br clear=3D"all">
<br>
-- <br>
Regards-<o:p></o:p></p>
<div>
<p class=3D"MsoNormal">Pavan<o:p></o:p></p>
</div>
</div>
</div>
</body>
</html>

--_000_869970D71E26D7498BDAC4E1CA92226B86D38B39MBX021E3NJ2exch_--