Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 76E069DB4 for ; Tue, 20 Dec 2011 19:39:14 +0000 (UTC) Received: (qmail 90734 invoked by uid 500); 20 Dec 2011 19:39:13 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 90696 invoked by uid 500); 20 Dec 2011 19:39:13 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 90684 invoked by uid 99); 20 Dec 2011 19:39:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Dec 2011 19:39:13 +0000 X-ASF-Spam-Status: No, hits=2.9 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,SPF_NEUTRAL,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.216.41] (HELO mail-qw0-f41.google.com) (209.85.216.41) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Dec 2011 19:39:09 +0000 Received: by qabg40 with SMTP id g40so1907260qab.14 for ; Tue, 20 Dec 2011 11:38:47 -0800 (PST) MIME-Version: 1.0 Received: by 10.224.18.66 with SMTP id v2mr4847136qaa.91.1324409927471; Tue, 20 Dec 2011 11:38:47 -0800 (PST) Received: by 10.229.40.18 with HTTP; Tue, 20 Dec 2011 11:38:47 -0800 (PST) In-Reply-To: References: Date: Tue, 20 Dec 2011 11:38:47 -0800 Message-ID: Subject: Re: Map and Reduce process hang out at 0% From: Vinod Kumar Vavilapalli To: mapreduce-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=bcaec51b170f4391bd04b48b37d8 --bcaec51b170f4391bd04b48b37d8 Content-Type: text/plain; charset=ISO-8859-1 Actually after MAPREDUCE-2652, all tasks will fail if they get containers on nodes with no shuffle service with an "invalid shuffle port error". The job should eventually fail. Maybe the OP was running into something else, will wait for him to return with more information. Thanks, +Vinod On Tue, Dec 20, 2011 at 11:29 AM, Robert Evans wrote: > Should we file a JIRA so that the MapReduce AM blows up in a more > obvious fashion if Shuffle is not configured? > > --Bobby Evans > > > On 12/20/11 12:30 PM, "Vinod Kumar Vavilapalli" > wrote: > > > I guess you don't have shuffle configured. Can you look at the application > master (AM) logs and paste logs from there? There will a link to AM logs on > the application page of RM web UI. > > You can also check and see if shuffle is configured. From the INSTALL file > ( > http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-mapreduce-project/INSTALL > ), > > Step 7) Setup config: for running mapreduce applications, which now are in > user land, you need to setup nodemanager with the following configuration > in your yarn-site.xml before you start the nodemanager. > > yarn.nodemanager.aux-services > mapreduce.shuffle > > > > yarn.nodemanager.aux-services.mapreduce.shuffle.class > org.apache.hadoop.mapred.ShuffleHandler > > > Step 8) Modify mapred-site.xml to use yarn framework > > mapreduce.framework.name > > yarn > > > > +Vinod > > > On Tue, Dec 20, 2011 at 8:12 AM, Arun C Murthy > wrote: > > Can you look at the /nodes web-page to see how many nodes you have? > > Also, do you see any exceptions in the ResourceManager logs on dn5? > > Arun > > On Dec 20, 2011, at 5:14 AM, Jingui Lee wrote: > > Hi,all > > I am running hadoop 0.23 on 5 nodes. > > I could run any YARN application or Mapreduce Job on this cluster before. > > But, after I changed Resourcemanager Node from node4 to node5, when I run > applications (I have modified property referenced in configure file), map > and reduce process will hang up at 0% until I killed the application. > > I don't know why. > > terminal output: > > bin/hadoop jar hadoop-mapreduce-examples-0.23.0.jar wordcount > /share/stdinput/1k /testread/hao > 11/12/20 20:20:29 INFO mapreduce.Cluster: Cannot pick > org.apache.hadoop.mapred.LocalClientProtocolProvider as the > ClientProtocolProvider - returned null protocol > 11/12/20 20:20:29 INFO ipc.YarnRPC: Creating YarnRPC for > org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC > 11/12/20 20:20:29 INFO mapred.ResourceMgrDelegate: Connecting to > ResourceManager at dn5/192.168.3.204:50010 > 11/12/20 20:20:29 INFO ipc.HadoopYarnRPC: Creating a HadoopYarnProtoRpc > proxy for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol > 11/12/20 20:20:29 INFO mapred.ResourceMgrDelegate: Connected to > ResourceManager at dn5/192.168.3.204:50010 > 11/12/20 20:20:29 WARN conf.Configuration: fs.default.name < > http://fs.default.name/> is deprecated. Instead, use fs.defaultFS > > 11/12/20 20:20:29 WARN conf.Configuration: > mapred.used.genericoptionsparser is deprecated. Instead, use > mapreduce.client.genericoptionsparser.used > 11/12/20 20:20:29 INFO input.FileInputFormat: Total input paths to process > : 1 > 11/12/20 20:20:29 INFO util.NativeCodeLoader: Loaded the native-hadoop > library > 11/12/20 20:20:29 WARN snappy.LoadSnappy: Snappy native library not loaded > 11/12/20 20:20:29 INFO mapreduce.JobSubmitter: number of splits:1 > 11/12/20 20:20:29 INFO mapred.YARNRunner: AppMaster capability = memory: > 2048 > 11/12/20 20:20:29 INFO mapred.YARNRunner: Command to launch container for > ApplicationMaster is : $JAVA_HOME/bin/java > -Dlog4j.configuration=container-log4j.properties > -Dyarn.app.mapreduce.container.log.dir= > -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA > -Xmx1536m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1>/stdout > 2>/stderr > 11/12/20 20:20:29 INFO mapred.ResourceMgrDelegate: Submitted application > application_1324372145692_0004 to ResourceManager > 11/12/20 20:20:29 INFO mapred.ClientCache: Connecting to HistoryServer at: > dn5:10020 > 11/12/20 20:20:29 INFO ipc.YarnRPC: Creating YarnRPC for > org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC > 11/12/20 20:20:29 INFO mapred.ClientCache: Connected to HistoryServer at: > dn5:10020 > 11/12/20 20:20:29 INFO ipc.HadoopYarnRPC: Creating a HadoopYarnProtoRpc > proxy for protocol interface > org.apache.hadoop.mapreduce.v2.api.MRClientProtocol > 11/12/20 20:20:30 INFO mapreduce.Job: Running job: job_1324372145692_0004 > 11/12/20 20:20:31 INFO mapreduce.Job: map 0% reduce 0% > > > > > > --bcaec51b170f4391bd04b48b37d8 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

Actually after=A0MAPREDUCE-2652, all tasks will fail if they= get containers on nodes with no shuffle service with an "invalid shuf= fle port error". The job should eventually fail.

Ma= ybe the OP was running into something else, will wait for him to return wit= h more information.

Thanks,
+Vinod

O= n Tue, Dec 20, 2011 at 11:29 AM, Robert Evans <evans@yahoo-inc.com> wrote:
Should we file a JIRA so that the MapReduce AM blows up in a more obv= ious fashion if Shuffle is not configured?

--Bobby Evans


On 12/20/11 12:30 PM, "Vinod Kumar Vavilapalli" <vinodkv@hortonworks.com= > wrote:


I guess you don't have shuffle configured. Can you look at the applicat= ion master (AM) logs and paste logs from there? There will a link to AM log= s on the application page of RM web UI.

You can also check and see if shuffle is configured. From the INSTALL file = (http://svn.apache.org/repos/asf/ha= doop/common/trunk/hadoop-mapreduce-project/INSTALL),
Step 7) Setup confi= g: for running mapreduce applications, which now are in user land, you need= to setup nodemanager with the following configuration in your yarn-site.xm= l before you start the nodemanager.
=A0 =A0 <property>
=A0 =A0 =A0 <name>yarn.nodemanager.aux-services</name>
=A0 =A0 =A0 <value>mapreduce.shuffle</value>
=A0 =A0 </property>

=A0 =A0 <property>
=A0 =A0 =A0 <name>yarn.nodemanager.aux-services.mapreduce.shuffle.cla= ss</name>
=A0 =A0 =A0 <value>org.apache.hadoop.mapred.ShuffleHandler</value&= gt;
=A0 =A0 </property>

Step 8) Modify mapred-site.xml to use yarn framework
=A0 =A0 <property> =A0 =A0
=A0 =A0 =A0 <name> mapreduce.framework.name <http://mapreduce.framework.name> <= ;/name>
=A0 =A0 =A0 <value>yarn</value> =A0
=A0 =A0 </property>

+Vinod


On Tue, Dec 20, 2011 at 8:12 AM, Arun C Murthy <acm@hortonworks.com> wrote:
Can you look at the= /nodes web-page to see how many nodes you have?

Also, do you see any exceptions in the ResourceManager logs on dn5?

Arun

On Dec 20, 2011, at 5:14 AM, Jingui Lee wrote:

Hi,all

I am running hadoop 0.23 on 5 nodes.

I could run any YARN application or Mapreduce Job on this cluster before.
But, after I changed Resourcemanager Node from node4 to node5, when I run a= pplications (I have modified property referenced in configure file), map an= d reduce process will hang up at 0% until I killed the application.

I don't know why.

terminal output:

bin/hadoop jar hadoop-mapreduce-examples-0.23.0.jar wordcount /share/stdinp= ut/1k /testread/hao
11/12/20 20:20:29 INFO mapreduce.Cluster: Cannot pick org.apache.hadoop.map= red.LocalClientProtocolProvider as the ClientProtocolProvider - returned nu= ll protocol
11/12/20 20:20:29 INFO ipc.YarnRPC: Creating YarnRPC for org.apache.hadoop.= yarn.ipc.HadoopYarnProtoRPC
11/12/20 20:20:29 INFO mapred.ResourceMgrDelegate: Connecting to ResourceMa= nager at dn5/192.1= 68.3.204:50010 <http://192.168.3.204:50010/>
11/12/20 20:20:29 INFO ipc.HadoopYarnRPC: Creating a HadoopYarnProtoRpc pro= xy for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
11/12/20 20:20:29 INFO mapred.ResourceMgrDelegate: Connected to ResourceMan= ager at dn5/192.16= 8.3.204:50010 <http://192.168.3.204:50010/>
11/12/20 20:20:29 WARN conf.Configuration: fs.default.name <http://fs.default.name/> =A0is deprecated. I= nstead, use fs.defaultFS

11/12/20 20:20:29 WARN conf.Configuration: mapred.used.genericoptionsparser= is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
11/12/20 20:20:29 INFO input.FileInputFormat: Total input paths to process = : 1
11/12/20 20:20:29 INFO util.NativeCodeLoader: Loaded the native-hadoop libr= ary
11/12/20 20:20:29 WARN snappy.LoadSnappy: Snappy native library not loaded<= br> 11/12/20 20:20:29 INFO mapreduce.JobSubmitter: number of splits:1
11/12/20 20:20:29 INFO mapred.YARNRunner: AppMaster capability =3D memory: = 2048
11/12/20 20:20:29 INFO mapred.YARNRunner: Command to launch container for A= pplicationMaster is : $JAVA_HOME/bin/java -Dlog4j.configuration=3Dcontainer= -log4j.properties -Dyarn.app.mapreduce.container.log.dir=3D<LOG_DIR> = -Dyarn.app.mapreduce.container.log.filesize=3D0 -Dhadoop.root.logger=3DINFO= ,CLA -Xmx1536m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1><LOG_= DIR>/stdout 2><LOG_DIR>/stderr
11/12/20 20:20:29 INFO mapred.ResourceMgrDelegate: Submitted application ap= plication_1324372145692_0004 to ResourceManager
11/12/20 20:20:29 INFO mapred.ClientCache: Connecting to HistoryServer at: = dn5:10020
11/12/20 20:20:29 INFO ipc.YarnRPC: Creating YarnRPC for org.apache.hadoop.= yarn.ipc.HadoopYarnProtoRPC
11/12/20 20:20:29 INFO mapred.ClientCache: Connected to HistoryServer at: d= n5:10020
11/12/20 20:20:29 INFO ipc.HadoopYarnRPC: Creating a HadoopYarnProtoRpc pro= xy for protocol interface org.apache.hadoop.mapreduce.v2.api.MRClientProtoc= ol
11/12/20 20:20:30 INFO mapreduce.Job: Running job: job_1324372145692_0004 11/12/20 20:20:31 INFO mapreduce.Job:=A0 map 0% reduce 0%






--bcaec51b170f4391bd04b48b37d8--