Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of samliuhadoop@gmail.com
 designates 209.85.128.46 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAHH8OOfd-Y5Rvjj9w5fRk6rEc110dF=fj43XeKfJAw6pLJRcKA@mail.gmail.com>
References: 
 <CAHH8OOfRHRhKw8p9Oamp9+xqEFadPEGemcHYfc2ubScsNAFfDg@mail.gmail.com>
	<CAHH8OOe_fN6d8dkOdYu4G44_godBpi58Ey+KPeQLs3zYX+gB_w@mail.gmail.com>
	<A99DC639-B806-4695-A676-117BF1072CEE@hortonworks.com>
	<CAHH8OOfd-Y5Rvjj9w5fRk6rEc110dF=fj43XeKfJAw6pLJRcKA@mail.gmail.com>
Date: Sun, 20 Oct 2013 21:26:47 +0800
Message-ID: 
 <CAHH8OOe8E93F8btoN8bDSRUaQFygc+WmN-KR0Pit7QzjzO+m3Q@mail.gmail.com>
Subject: Re: Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort
 job?
From: sam liu <samliuhadoop@gmail.com>
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=001a1132ee5498c1bf04e92c1e28

--001a1132ee5498c1bf04e92c1e28
Content-Type: text/plain; charset=ISO-8859-1

Furthermore, I did another test: rename TeraSort#TotalOrderPartitioner to
TeraSort#MyOwnTotalOrderPartitioner to avoid conflicting with other
homonymic classes in hadoop classpath. Also, in TeraSort.java, I modified
'job.setPartitionerClass(TotalOrderPartitioner.class);' to
'job.setPartitionerClass(MyOwnTotalOrderPartitioner.class);'. However,
seems the MyOwnTotalOrderPartitioner was not invoked during executing
terasort job.

BTW, in TeraSort#TotalOrderPartitioner#readPartitions(), there is a
statement 'DataInputStream reader = fs.open(p);', and I know 'p' is the
path of '_partition.lst'. But I am not clear two details:
- Where is the location of 'p'? It's on hdfs or Linux file system? What's
its absolute path?
- Which part or phase of Hadoop MapReduce copy the _partition.lst file to
the path 'p'? I am very confusing this part

Thanks very much!


2013/10/20 sam liu <samliuhadoop@gmail.com>

> After I took following actions, the job still could pass and seems all
> TotalOrderPartitioner classes were not invoked at all:
> - Modified libexec/hadoop-config.sh to put
> hadoop-mapreduce-examples-2.0.4-alpha.jar in the front of hadoop classpath,
> and it should ensure the TeraSort#
> TotalOrderPartitioner will be invoked first
> - Fiddled with org.apache.hadoop.mapreduce.TotalOrderPartitioner, and then
> replace with the new generated
> share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.0.4-alpha.jar
>
>
> 2013/10/19 Arun C Murthy <acm@hortonworks.com>
>
>> Apologies for the late response.
>>
>> In hadoop-2 TeraSort uses the new org.apache.hadoop.mapreduce apis (not
>> org.apache.hadoop.mapred).
>>
>> Did you fiddle with the right TotalOrderPartitioner
>> i.e. org.apache.hadoop.mapreduce.TotalOrderPartitioner?
>>
>> Arun
>>
>> On Oct 17, 2013, at 8:12 PM, sam liu <samliuhadoop@gmail.com> wrote:
>>
>> It's really weird and confusing me. Anyone can help this question?
>>
>> Thanks!
>>
>>
>> 2013/10/16 sam liu <samliuhadoop@gmail.com>
>>
>>> Hi Experts,
>>>
>>> In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as
>>> its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'.
>>> However, seems Yarn did not execute the methods of
>>> TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as
>>> below:
>>>
>>> Test 1: Add some code in the method readPartitions() and setConf() in
>>> TeraSort#TotalOrderPartitioner to print some words and write some word to a
>>> file.
>>> Expected Result: Some words should be printed and wrote into a file
>>> Actual Result: No word was printed and wrote into a file at all
>>>
>>> Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner,
>>> but only remaining some necessary but empty methods in it
>>> Expected Result: TeraSort job will ocurr some exception, as the
>>> specified Partitioner is not implemented at all
>>> Actual Result: TeraSort job completed successfully without any exception
>>>
>>> Above tests confused me a lot, because seems Yarn never use specified
>>> partitioner TeraSort#TotalOrderPartitioner at all during job execution.
>>>
>>> Any one can help provide the reasons?
>>>
>>> Thanks very much!
>>>
>>
>>
>>  --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>

--001a1132ee5498c1bf04e92c1e28
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div><div><div>Furthermore, I did another test: renam=
e TeraSort#TotalOrderPartitioner to TeraSort#MyOwnTotalOrderPartitioner to =
avoid conflicting with other homonymic classes in hadoop classpath. Also, i=
n TeraSort.java, I modified &#39;job.setPartitionerClass(TotalOrderPartitio=
ner.class);&#39; to &#39;job.setPartitionerClass(MyOwnTotalOrderPartitioner=
.class);&#39;. However, seems the MyOwnTotalOrderPartitioner was not invoke=
d during executing terasort job.<br>
<br></div>BTW, in  TeraSort#TotalOrderPartitioner#readPartitions(), there i=
s a statement &#39;DataInputStream reader =3D fs.open(p);&#39;, and I know =
&#39;p&#39; is the path of &#39;_partition.lst&#39;. But I am not clear two=
 details:<br>
</div>- Where is the location of &#39;p&#39;? It&#39;s on hdfs or Linux fil=
e system? What&#39;s its absolute path?<br></div>- Which part or phase of H=
adoop MapReduce copy the _partition.lst file to the path &#39;p&#39;? I am =
very confusing this part<br>
<br></div>Thanks very much!<br><div><div><br></div></div><div class=3D"gmai=
l_extra"><br><br><div class=3D"gmail_quote">2013/10/20 sam liu <span dir=3D=
"ltr">&lt;<a href=3D"mailto:samliuhadoop@gmail.com" target=3D"_blank">samli=
uhadoop@gmail.com</a>&gt;</span><br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div>After I took following=
 actions, the job still could pass and seems all TotalOrderPartitioner clas=
ses were not invoked at all:<br>
</div>- Modified libexec/hadoop-config.sh to put hadoop-mapreduce-examples-=
2.0.4-alpha.jar in the front of hadoop classpath, and it should ensure the =
TeraSort#<div>
TotalOrderPartitioner will be invoked first<br></div><div>- Fiddled with or=
g.apache.hadoop.mapreduce.TotalOrderPartitioner, and then replace with the =
new generated share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.0.4-alp=
ha.jar<br>

</div></div><div class=3D"HOEnZb"><div class=3D"h5"><div class=3D"gmail_ext=
ra"><br><br><div class=3D"gmail_quote">2013/10/19 Arun C Murthy <span dir=
=3D"ltr">&lt;<a href=3D"mailto:acm@hortonworks.com" target=3D"_blank">acm@h=
ortonworks.com</a>&gt;</span><br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
<div style=3D"word-wrap:break-word">Apologies for the late response.<div><b=
r></div><div>In hadoop-2 TeraSort uses the new org.apache.hadoop.mapreduce =
apis (not org.apache.hadoop.mapred).</div><div><br></div><div>Did you fiddl=
e with the right TotalOrderPartitioner i.e.=A0org.apache.hadoop.mapreduce.T=
otalOrderPartitioner?</div>

<div><br></div><div>Arun</div><div><div><div><br><div><div>On Oct 17, 2013,=
 at 8:12 PM, sam liu &lt;<a href=3D"mailto:samliuhadoop@gmail.com" target=
=3D"_blank">samliuhadoop@gmail.com</a>&gt; wrote:</div><br><blockquote type=
=3D"cite">

<div dir=3D"ltr">It&#39;s really weird and confusing me. Anyone can help th=
is question? <br><br>Thanks!<br><div class=3D"gmail_extra"><br><br><div cla=
ss=3D"gmail_quote">2013/10/16 sam liu <span dir=3D"ltr">&lt;<a href=3D"mail=
to:samliuhadoop@gmail.com" target=3D"_blank">samliuhadoop@gmail.com</a>&gt;=
</span><br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><div><div><div><div><d=
iv><div><div><div><div>Hi Experts,<br><br></div>In Hadoop-2.0.4, the TeraSo=
rt leverage TeraSort#TotalOrderPartitioner as its Partitioner: &#39;job.set=
PartitionerClass(TotalOrderPartitioner.class);&#39;. However, seems Yarn di=
d not execute the methods of TeraSort#TotalOrderPartitioner at all. I did s=
ome tests to verify it as below:<br>


<br></div>Test 1: Add some code in the method readPartitions() and setConf(=
) in TeraSort#TotalOrderPartitioner to print some words and write some word=
 to a file.<br></div>Expected Result: Some words should be printed and wrot=
e into a file<br>


</div>Actual Result: No word was printed and wrote into a file at all<br><b=
r></div>Test 2: Remove all existing methods in TeraSort#TotalOrderPartition=
er, but only remaining some necessary but empty methods in it<br></div>


Expected Result: TeraSort job will ocurr some exception, as the specified P=
artitioner is not implemented at all<br></div>Actual Result: TeraSort job c=
ompleted successfully without any exception<br><br></div>Above tests confus=
ed me a lot, because seems Yarn never use specified partitioner TeraSort#To=
talOrderPartitioner at all during job execution. <br>


<br></div>Any one can help provide the reasons?<br><br></div>Thanks very mu=
ch!<br></div>
</blockquote></div><br></div></div>
</blockquote></div><br></div></div><div>
<span style=3D"border-collapse:separate;border-spacing:0px"><span style=3D"=
text-indent:0px;letter-spacing:normal;font-variant:normal;text-align:-webki=
t-auto;font-style:normal;font-weight:normal;line-height:normal;border-colla=
pse:separate;text-transform:none;font-size:medium;white-space:normal;font-f=
amily:Helvetica;word-spacing:0px"><div style=3D"word-wrap:break-word">

<span style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;te=
xt-align:-webkit-auto;font-style:normal;font-weight:normal;line-height:norm=
al;border-collapse:separate;text-transform:none;font-size:medium;white-spac=
e:normal;font-family:Helvetica;word-spacing:0px"><div style=3D"word-wrap:br=
eak-word">

--</div><div style=3D"word-wrap:break-word">Arun C. Murthy</div><div style=
=3D"word-wrap:break-word">Hortonworks Inc.<br><a href=3D"http://hortonworks=
.com/" target=3D"_blank">http://hortonworks.com/</a><br><br></div></span></=
div>
</span></span>
</div>
<br></div></div>
<br>
<span style=3D"color:rgb(128,128,128);font-family:Arial,sans-serif;font-siz=
e:10px">CONFIDENTIALITY NOTICE</span><br style=3D"color:rgb(128,128,128);fo=
nt-family:Arial,sans-serif;font-size:10px"><span style=3D"color:rgb(128,128=
,128);font-family:Arial,sans-serif;font-size:10px">NOTICE: This message is =
intended for the use of the individual or entity to which it is addressed a=
nd may contain information that is confidential, privileged and exempt from=
 disclosure under applicable law. If the reader of this message is not the =
intended recipient, you are hereby notified that any printing, copying, dis=
semination, distribution, disclosure or forwarding of this communication is=
 strictly prohibited. If you have received this communication in error, ple=
ase contact the sender immediately and delete it from your system. Thank Yo=
u.</span></blockquote>

</div><br></div>
</div></div></blockquote></div><br></div></div>

--001a1132ee5498c1bf04e92c1e28--