Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 36DAE1026A for ; Sun, 20 Oct 2013 13:27:27 +0000 (UTC) Received: (qmail 39724 invoked by uid 500); 20 Oct 2013 13:27:16 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 39386 invoked by uid 500); 20 Oct 2013 13:27:14 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 39379 invoked by uid 99); 20 Oct 2013 13:27:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 20 Oct 2013 13:27:13 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of samliuhadoop@gmail.com designates 209.85.128.46 as permitted sender) Received: from [209.85.128.46] (HELO mail-qe0-f46.google.com) (209.85.128.46) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 20 Oct 2013 13:27:09 +0000 Received: by mail-qe0-f46.google.com with SMTP id s14so3079848qeb.5 for ; Sun, 20 Oct 2013 06:26:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=2oJaR3Atou1f5tqnthX+MBvON0WXZZ0XbK9iWT3L1cM=; b=T2Zs7kU0M3F7467albABiiipItZAsEfOds4zhxytdtMBVh4ZXRLNzi23nsYtgSXQVK vfH75smo78AMZtjGLSB5UxNxEYAKVOZe31cyupXeJqLoNGJnsI7Xm/VC4dXbSNM3EC1C Dht6n3EO1dtToC7IvTwu0uiJbXDRnFheRt8GtobDT4WO33I0lvtIrVXUWMvj0mdSPcPT ZLBNHfe1NL0QWiCGA2zDjzjQj/WD3sQT3oHIB4Ta2AqzxYnpI27dG8uYMAJaKRy+dVpR gVP5RQA4cDeeFaFJZmUayf62yj/76Kap0XcUa6fr4qlqbQgdFFkI6tEPGefWd1ZvYnaO 0IEA== MIME-Version: 1.0 X-Received: by 10.224.54.66 with SMTP id p2mr17075049qag.87.1382275607994; Sun, 20 Oct 2013 06:26:47 -0700 (PDT) Received: by 10.96.35.37 with HTTP; Sun, 20 Oct 2013 06:26:47 -0700 (PDT) In-Reply-To: References: Date: Sun, 20 Oct 2013 21:26:47 +0800 Message-ID: Subject: Re: Yarn never use TeraSort#TotalOrderPartitioner when run TeraSort job? From: sam liu To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=001a1132ee5498c1bf04e92c1e28 X-Virus-Checked: Checked by ClamAV on apache.org --001a1132ee5498c1bf04e92c1e28 Content-Type: text/plain; charset=ISO-8859-1 Furthermore, I did another test: rename TeraSort#TotalOrderPartitioner to TeraSort#MyOwnTotalOrderPartitioner to avoid conflicting with other homonymic classes in hadoop classpath. Also, in TeraSort.java, I modified 'job.setPartitionerClass(TotalOrderPartitioner.class);' to 'job.setPartitionerClass(MyOwnTotalOrderPartitioner.class);'. However, seems the MyOwnTotalOrderPartitioner was not invoked during executing terasort job. BTW, in TeraSort#TotalOrderPartitioner#readPartitions(), there is a statement 'DataInputStream reader = fs.open(p);', and I know 'p' is the path of '_partition.lst'. But I am not clear two details: - Where is the location of 'p'? It's on hdfs or Linux file system? What's its absolute path? - Which part or phase of Hadoop MapReduce copy the _partition.lst file to the path 'p'? I am very confusing this part Thanks very much! 2013/10/20 sam liu > After I took following actions, the job still could pass and seems all > TotalOrderPartitioner classes were not invoked at all: > - Modified libexec/hadoop-config.sh to put > hadoop-mapreduce-examples-2.0.4-alpha.jar in the front of hadoop classpath, > and it should ensure the TeraSort# > TotalOrderPartitioner will be invoked first > - Fiddled with org.apache.hadoop.mapreduce.TotalOrderPartitioner, and then > replace with the new generated > share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.0.4-alpha.jar > > > 2013/10/19 Arun C Murthy > >> Apologies for the late response. >> >> In hadoop-2 TeraSort uses the new org.apache.hadoop.mapreduce apis (not >> org.apache.hadoop.mapred). >> >> Did you fiddle with the right TotalOrderPartitioner >> i.e. org.apache.hadoop.mapreduce.TotalOrderPartitioner? >> >> Arun >> >> On Oct 17, 2013, at 8:12 PM, sam liu wrote: >> >> It's really weird and confusing me. Anyone can help this question? >> >> Thanks! >> >> >> 2013/10/16 sam liu >> >>> Hi Experts, >>> >>> In Hadoop-2.0.4, the TeraSort leverage TeraSort#TotalOrderPartitioner as >>> its Partitioner: 'job.setPartitionerClass(TotalOrderPartitioner.class);'. >>> However, seems Yarn did not execute the methods of >>> TeraSort#TotalOrderPartitioner at all. I did some tests to verify it as >>> below: >>> >>> Test 1: Add some code in the method readPartitions() and setConf() in >>> TeraSort#TotalOrderPartitioner to print some words and write some word to a >>> file. >>> Expected Result: Some words should be printed and wrote into a file >>> Actual Result: No word was printed and wrote into a file at all >>> >>> Test 2: Remove all existing methods in TeraSort#TotalOrderPartitioner, >>> but only remaining some necessary but empty methods in it >>> Expected Result: TeraSort job will ocurr some exception, as the >>> specified Partitioner is not implemented at all >>> Actual Result: TeraSort job completed successfully without any exception >>> >>> Above tests confused me a lot, because seems Yarn never use specified >>> partitioner TeraSort#TotalOrderPartitioner at all during job execution. >>> >>> Any one can help provide the reasons? >>> >>> Thanks very much! >>> >> >> >> -- >> Arun C. Murthy >> Hortonworks Inc. >> http://hortonworks.com/ >> >> >> >> CONFIDENTIALITY NOTICE >> NOTICE: This message is intended for the use of the individual or entity >> to which it is addressed and may contain information that is confidential, >> privileged and exempt from disclosure under applicable law. If the reader >> of this message is not the intended recipient, you are hereby notified that >> any printing, copying, dissemination, distribution, disclosure or >> forwarding of this communication is strictly prohibited. If you have >> received this communication in error, please contact the sender immediately >> and delete it from your system. Thank You. > > > --001a1132ee5498c1bf04e92c1e28 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Furthermore, I did another test: renam= e TeraSort#TotalOrderPartitioner to TeraSort#MyOwnTotalOrderPartitioner to = avoid conflicting with other homonymic classes in hadoop classpath. Also, i= n TeraSort.java, I modified 'job.setPartitionerClass(TotalOrderPartitio= ner.class);' to 'job.setPartitionerClass(MyOwnTotalOrderPartitioner= .class);'. However, seems the MyOwnTotalOrderPartitioner was not invoke= d during executing terasort job.

BTW, in TeraSort#TotalOrderPartitioner#readPartitions(), there i= s a statement 'DataInputStream reader =3D fs.open(p);', and I know = 'p' is the path of '_partition.lst'. But I am not clear two= details:
- Where is the location of 'p'? It's on hdfs or Linux fil= e system? What's its absolute path?
- Which part or phase of H= adoop MapReduce copy the _partition.lst file to the path 'p'? I am = very confusing this part

Thanks very much!



2013/10/20 sam liu <samli= uhadoop@gmail.com>
After I took following= actions, the job still could pass and seems all TotalOrderPartitioner clas= ses were not invoked at all:
- Modified libexec/hadoop-config.sh to put hadoop-mapreduce-examples-= 2.0.4-alpha.jar in the front of hadoop classpath, and it should ensure the = TeraSort#
TotalOrderPartitioner will be invoked first
- Fiddled with or= g.apache.hadoop.mapreduce.TotalOrderPartitioner, and then replace with the = new generated share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.0.4-alp= ha.jar


2013/10/19 Arun C Murthy <acm@h= ortonworks.com>
Apologies for the late response.
In hadoop-2 TeraSort uses the new org.apache.hadoop.mapreduce = apis (not org.apache.hadoop.mapred).

Did you fiddl= e with the right TotalOrderPartitioner i.e.=A0org.apache.hadoop.mapreduce.T= otalOrderPartitioner?

Arun

On Oct 17, 2013,= at 8:12 PM, sam liu <samliuhadoop@gmail.com> wrote:

It's really weird and confusing me. Anyone can help th= is question?

Thanks!


2013/10/16 sam liu <samliuhadoop@gmail.com>=
Hi Experts,

In Hadoop-2.0.4, the TeraSo= rt leverage TeraSort#TotalOrderPartitioner as its Partitioner: 'job.set= PartitionerClass(TotalOrderPartitioner.class);'. However, seems Yarn di= d not execute the methods of TeraSort#TotalOrderPartitioner at all. I did s= ome tests to verify it as below:

Test 1: Add some code in the method readPartitions() and setConf(= ) in TeraSort#TotalOrderPartitioner to print some words and write some word= to a file.
Expected Result: Some words should be printed and wrot= e into a file
Actual Result: No word was printed and wrote into a file at all
Test 2: Remove all existing methods in TeraSort#TotalOrderPartition= er, but only remaining some necessary but empty methods in it
Expected Result: TeraSort job will ocurr some exception, as the specified P= artitioner is not implemented at all
Actual Result: TeraSort job c= ompleted successfully without any exception

Above tests confus= ed me a lot, because seems Yarn never use specified partitioner TeraSort#To= talOrderPartitioner at all during job execution.

Any one can help provide the reasons?

Thanks very mu= ch!


--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



CONFIDENTIALITY NOTICE
NOTICE: This message is = intended for the use of the individual or entity to which it is addressed a= nd may contain information that is confidential, privileged and exempt from= disclosure under applicable law. If the reader of this message is not the = intended recipient, you are hereby notified that any printing, copying, dis= semination, distribution, disclosure or forwarding of this communication is= strictly prohibited. If you have received this communication in error, ple= ase contact the sender immediately and delete it from your system. Thank Yo= u.


--001a1132ee5498c1bf04e92c1e28--