Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 26259 invoked from network); 24 Feb 2009 09:29:04 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 24 Feb 2009 09:29:04 -0000 Received: (qmail 15794 invoked by uid 500); 24 Feb 2009 09:28:58 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 15754 invoked by uid 500); 24 Feb 2009 09:28:57 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 15743 invoked by uid 99); 24 Feb 2009 09:28:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Feb 2009 01:28:57 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [124.30.170.13] (HELO hydmail1.satyam.com) (124.30.170.13) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Feb 2009 09:28:50 +0000 Received: from hyd.satyam.com (hyd.satyam.com [204.252.161.226]) by hydmail1.satyam.com (8.13.4/8.13.4) with ESMTP id n1O9OcH3003658 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Tue, 24 Feb 2009 14:54:38 +0530 Received: from hstmsg009.corp.satyam.ad ([172.16.186.231]) by hyd.satyam.com (8.13.1/8.13.1) with ESMTP id n1O9NtJS007022 for ; Tue, 24 Feb 2009 14:53:55 +0530 Received: from hcsmsg004.corp.satyam.ad (172.16.130.232) by hstmsg009.corp.satyam.ad (172.16.186.231) with Microsoft SMTP Server (TLS) id 8.1.263.0; Tue, 24 Feb 2009 14:57:39 +0530 Received: from hcsmbx003.corp.satyam.ad ([172.16.130.227]) by hcsmsg004.corp.satyam.ad ([172.16.130.232]) with mapi; Tue, 24 Feb 2009 14:57:38 +0530 From: Jagadesh_Doddi To: "core-user@hadoop.apache.org" Date: Tue, 24 Feb 2009 14:58:20 +0530 Subject: RE: Reducer hangs at 16% Thread-Topic: Reducer hangs at 16% Thread-Index: AcmV1NAhnQJWBeNxSfehApIiQM9jMwAjLbqw Message-ID: <16C94259266DA34EA21390B762BECD1419AA4336@hcsmbx003.corp.satyam.ad> References: <16C94259266DA34EA21390B762BECD14199DACE1@hcsmbx003.corp.satyam.ad> <16C94259266DA34EA21390B762BECD14199DAE4B@hcsmbx003.corp.satyam.ad> <49A29CFC.5090601@yahoo-inc.com> <16C94259266DA34EA21390B762BECD14199DAE8B@hcsmbx003.corp.satyam.ad> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Scanned: ClamAV version 0.88.4, clamav-milter version 0.88.4 on hyd.satyam.com. X-Virus-Status: Clean X-Virus-Checked: Checked by ClamAV on apache.org I have opened the ports 50010, 50030, 50060, 50070, 50075 and 50090 It works fine now. Thanks Matie. Thanks Jagadesh Doddi -----Original Message----- From: Matei Zaharia [mailto:matei@cloudera.com] Sent: Monday, February 23, 2009 10:06 PM To: core-user@hadoop.apache.org Subject: Re: Reducer hangs at 16% The fact that it works with one slave node doesn't mean much, because when the slave is alone, it's copying map outputs from itself and thus not going through the firewall. It sounds like the slaves can't open a connection to each other, which could well mean a firewall problem. Can you look at the output of the reduce task (by clicking it in the "running tasks" column in the web UI and going on to see the last 8k of output)? I imagine it will have fetched data from one slave and will be failing to connect to the othe= r one. On Mon, Feb 23, 2009 at 5:03 AM, Jagadesh_Doddi wrote: > It works as longs as I use any one of the slave nodes. > The moment I add both the slave nodes to conf/slaves, It fails. > So there is no issue with firewall or /etc/hosts entries. > > Thanks and Regards > > Jagadesh Doddi > > > > -----Original Message----- > From: Amar Kamat [mailto:amarrk@yahoo-inc.com] > Sent: Monday, February 23, 2009 6:26 PM > To: core-user@hadoop.apache.org > Subject: Re: Reducer hangs at 16% > > Looks like the reducer is able to fetch map output files from the local > box but fails to fetch it from the remote box. Can you check if there is > no firewall issue or /etc/hosts entries are correct? > Amar > Jagadesh_Doddi wrote: > > Hi > > > > I have changed the configuration to run Name node and job tracker on th= e > same system. > > The job is started with bin/start-all.sh on NN > > With a single slave node, the job completes in 12 seconds, and the > console output is shown below: > > > > [root@Fedora1 hadoop-0.18.3]# bin/hadoop jar samples/wordcount.jar > org.myorg.WordCount input output1 > > 09/02/23 17:19:30 WARN mapred.JobClient: Use GenericOptionsParser for > parsing the arguments. Applications should implement Tool for the same. > > 09/02/23 17:19:30 INFO mapred.FileInputFormat: Total input paths to > process : 1 > > 09/02/23 17:19:30 INFO mapred.FileInputFormat: Total input paths to > process : 1 > > 09/02/23 17:19:30 INFO mapred.JobClient: Running job: > job_200902231717_0001 > > 09/02/23 17:19:31 INFO mapred.JobClient: map 0% reduce 0% > > 09/02/23 17:19:37 INFO mapred.JobClient: map 100% reduce 0% > > 09/02/23 17:19:42 INFO mapred.JobClient: Job complete: > job_200902231717_0001 > > 09/02/23 17:19:42 INFO mapred.JobClient: Counters: 16 > > 09/02/23 17:19:42 INFO mapred.JobClient: Job Counters > > 09/02/23 17:19:42 INFO mapred.JobClient: Data-local map tasks=3D2 > > 09/02/23 17:19:42 INFO mapred.JobClient: Launched reduce tasks=3D1 > > 09/02/23 17:19:42 INFO mapred.JobClient: Launched map tasks=3D2 > > 09/02/23 17:19:42 INFO mapred.JobClient: Map-Reduce Framework > > 09/02/23 17:19:42 INFO mapred.JobClient: Map output records=3D25 > > 09/02/23 17:19:42 INFO mapred.JobClient: Reduce input records=3D23 > > 09/02/23 17:19:42 INFO mapred.JobClient: Map output bytes=3D238 > > 09/02/23 17:19:42 INFO mapred.JobClient: Map input records=3D5 > > 09/02/23 17:19:42 INFO mapred.JobClient: Combine output records=3D4= 6 > > 09/02/23 17:19:42 INFO mapred.JobClient: Map input bytes=3D138 > > 09/02/23 17:19:42 INFO mapred.JobClient: Combine input records=3D48 > > 09/02/23 17:19:42 INFO mapred.JobClient: Reduce input groups=3D23 > > 09/02/23 17:19:42 INFO mapred.JobClient: Reduce output records=3D23 > > 09/02/23 17:19:42 INFO mapred.JobClient: File Systems > > 09/02/23 17:19:42 INFO mapred.JobClient: HDFS bytes written=3D175 > > 09/02/23 17:19:42 INFO mapred.JobClient: Local bytes written=3D648 > > 09/02/23 17:19:42 INFO mapred.JobClient: HDFS bytes read=3D208 > > 09/02/23 17:19:42 INFO mapred.JobClient: Local bytes read=3D281 > > > > With two slave nodes, the job completes in 13 minutes, and the console > output is shown below: > > > > [root@Fedora1 hadoop-0.18.3]# bin/hadoop jar samples/wordcount.jar > org.myorg.WordCount input output2 > > 09/02/23 17:25:38 WARN mapred.JobClient: Use GenericOptionsParser for > parsing the arguments. Applications should implement Tool for the same. > > 09/02/23 17:25:38 INFO mapred.FileInputFormat: Total input paths to > process : 1 > > 09/02/23 17:25:38 INFO mapred.FileInputFormat: Total input paths to > process : 1 > > 09/02/23 17:25:39 INFO mapred.JobClient: Running job: > job_200902231722_0001 > > 09/02/23 17:25:40 INFO mapred.JobClient: map 0% reduce 0% > > 09/02/23 17:25:42 INFO mapred.JobClient: map 50% reduce 0% > > 09/02/23 17:25:43 INFO mapred.JobClient: map 100% reduce 0% > > 09/02/23 17:25:58 INFO mapred.JobClient: map 100% reduce 16% > > 09/02/23 17:38:31 INFO mapred.JobClient: Task Id : > attempt_200902231722_0001_m_000000_0, Status : FAILED > > Too many fetch-failures > > 09/02/23 17:38:31 WARN mapred.JobClient: Error reading task outputNo > route to host > > 09/02/23 17:38:31 WARN mapred.JobClient: Error reading task outputNo > route to host > > 09/02/23 17:38:43 INFO mapred.JobClient: Job complete: > job_200902231722_0001 > > 09/02/23 17:38:43 INFO mapred.JobClient: Counters: 16 > > 09/02/23 17:38:43 INFO mapred.JobClient: Job Counters > > 09/02/23 17:38:43 INFO mapred.JobClient: Data-local map tasks=3D3 > > 09/02/23 17:38:43 INFO mapred.JobClient: Launched reduce tasks=3D1 > > 09/02/23 17:38:43 INFO mapred.JobClient: Launched map tasks=3D3 > > 09/02/23 17:38:43 INFO mapred.JobClient: Map-Reduce Framework > > 09/02/23 17:38:43 INFO mapred.JobClient: Map output records=3D25 > > 09/02/23 17:38:43 INFO mapred.JobClient: Reduce input records=3D23 > > 09/02/23 17:38:43 INFO mapred.JobClient: Map output bytes=3D238 > > 09/02/23 17:38:43 INFO mapred.JobClient: Map input records=3D5 > > 09/02/23 17:38:43 INFO mapred.JobClient: Combine output records=3D4= 6 > > 09/02/23 17:38:43 INFO mapred.JobClient: Map input bytes=3D138 > > 09/02/23 17:38:43 INFO mapred.JobClient: Combine input records=3D48 > > 09/02/23 17:38:43 INFO mapred.JobClient: Reduce input groups=3D23 > > 09/02/23 17:38:43 INFO mapred.JobClient: Reduce output records=3D23 > > 09/02/23 17:38:43 INFO mapred.JobClient: File Systems > > 09/02/23 17:38:43 INFO mapred.JobClient: HDFS bytes written=3D175 > > 09/02/23 17:38:43 INFO mapred.JobClient: Local bytes written=3D648 > > 09/02/23 17:38:43 INFO mapred.JobClient: HDFS bytes read=3D208 > > 09/02/23 17:38:43 INFO mapred.JobClient: Local bytes read=3D281 > > > > Thanks > > > > Jagadesh > > > > > > > > -----Original Message----- > > From: Jothi Padmanabhan [mailto:jothipn@yahoo-inc.com] > > Sent: Monday, February 23, 2009 4:57 PM > > To: core-user@hadoop.apache.org > > Subject: Re: Reducer hangs at 16% > > > > OK. I am guessing that your problem arises from having two entries for > > master. The master should be the node where the JT is run (for > > start-mapred.sh) and NN is run (for start-dfs.sh). This might need a bi= t > > more effort to set up. To start with, you might want to try out having > both > > the JT and NN in the same machine (the node designated as master) and > then > > try start-all.sh. You need to configure you hadoop-site.xml correctly a= s > > well. > > > > Jothi > > > > > > > > > > On 2/23/09 4:36 PM, "Jagadesh_Doddi" wrote: > > > > > >> Hi > >> > >> I have setup as per the documentation in hadoop site. > >> On namenode, I am running bin/start-dfs.sh and on job tracker, I am > running > >> bin\start-mapred.sh > >> > >> Thanks and Regards > >> > >> Jagadesh Doddi > >> > >> > >> > >> -----Original Message----- > >> From: Jothi Padmanabhan [mailto:jothipn@yahoo-inc.com] > >> Sent: Monday, February 23, 2009 4:00 PM > >> To: core-user@hadoop.apache.org > >> Subject: Re: Reducer hangs at 16% > >> > >> Hi, > >> > >> This looks like a set up issue. See > >> > http://hadoop.apache.org/core/docs/current/cluster_setup.html#Configurati= on+ > >> Files > >> On how to set this up correctly. > >> > >> As an aside, how are you bringing up the hadoop daemons (JobTracker, > >> Namenode, TT and Datanodes)? Are you manually bringing them up or are > you > >> using bin/start-all.sh? > >> > >> Jothi > >> > >> > >> On 2/23/09 3:14 PM, "Jagadesh_Doddi" wrote= : > >> > >> > >>> I have setup a distributed environment on Fedora OS to run Hadoop. > >>> System Fedora1 is the name node, Fedora2 is Job tracker, Fedora3 and > Fedora4 > >>> are task trackers. > >>> Conf/masters contains the entries Fedora1, Fedors2, and conf/slaves > contains > >>> the entries Fedora3, Fedora4. > >>> When I run the sample wordcount example with single task tracker > (either > >>> Fedora3 or Fedora4), it works fine and the job completes in a few > seconds. > >>> However, when I add the other task tracker in conf/slaves, the reduce= r > stop > >>> at > >>> 16% and the job completes after 13 minutes. > >>> The same problem exists in versions 16.4, 17.2.1 and 18.3. The output > on the > >>> namenode console is shown below: > >>> > >>> [root@Fedora1 hadoop-0.17.2.1Cluster]# bin/hadoop jar > samples/wordcount.jar > >>> org.myorg.WordCount input output > >>> 09/02/19 17:43:18 INFO mapred.FileInputFormat: Total input paths to > process : > >>> 1 > >>> 09/02/19 17:43:19 INFO mapred.JobClient: Running job: > job_200902191741_0001 > >>> 09/02/19 17:43:20 INFO mapred.JobClient: map 0% reduce 0% > >>> 09/02/19 17:43:26 INFO mapred.JobClient: map 50% reduce 0% > >>> 09/02/19 17:43:27 INFO mapred.JobClient: map 100% reduce 0% > >>> 09/02/19 17:43:35 INFO mapred.JobClient: map 100% reduce 16% > >>> 09/02/19 17:56:15 INFO mapred.JobClient: Task Id : > >>> task_200902191741_0001_m_000001_0, Status : FAILED > >>> Too many fetch-failures > >>> 09/02/19 17:56:15 WARN mapred.JobClient: Error reading task outputNo > route to > >>> host > >>> 09/02/19 17:56:18 WARN mapred.JobClient: Error reading task outputNo > route to > >>> host > >>> 09/02/19 17:56:25 INFO mapred.JobClient: map 100% reduce 81% > >>> 09/02/19 17:56:26 INFO mapred.JobClient: map 100% reduce 100% > >>> 09/02/19 17:56:27 INFO mapred.JobClient: Job complete: > job_200902191741_0001 > >>> 09/02/19 17:56:27 INFO mapred.JobClient: Counters: 16 > >>> 09/02/19 17:56:27 INFO mapred.JobClient: Job Counters > >>> 09/02/19 17:56:27 INFO mapred.JobClient: Launched map tasks=3D3 > >>> 09/02/19 17:56:27 INFO mapred.JobClient: Launched reduce tasks=3D= 1 > >>> 09/02/19 17:56:27 INFO mapred.JobClient: Data-local map tasks=3D3 > >>> 09/02/19 17:56:27 INFO mapred.JobClient: Map-Reduce Framework > >>> 09/02/19 17:56:27 INFO mapred.JobClient: Map input records=3D5 > >>> 09/02/19 17:56:27 INFO mapred.JobClient: Map output records=3D25 > >>> 09/02/19 17:56:27 INFO mapred.JobClient: Map input bytes=3D138 > >>> 09/02/19 17:56:27 INFO mapred.JobClient: Map output bytes=3D238 > >>> 09/02/19 17:56:27 INFO mapred.JobClient: Combine input records=3D= 25 > >>> 09/02/19 17:56:27 INFO mapred.JobClient: Combine output records= =3D23 > >>> 09/02/19 17:56:27 INFO mapred.JobClient: Reduce input groups=3D23 > >>> 09/02/19 17:56:27 INFO mapred.JobClient: Reduce input records=3D2= 3 > >>> 09/02/19 17:56:27 INFO mapred.JobClient: Reduce output records=3D= 23 > >>> 09/02/19 17:56:27 INFO mapred.JobClient: File Systems > >>> 09/02/19 17:56:27 INFO mapred.JobClient: Local bytes read=3D522 > >>> 09/02/19 17:56:27 INFO mapred.JobClient: Local bytes written=3D11= 77 > >>> 09/02/19 17:56:27 INFO mapred.JobClient: HDFS bytes read=3D208 > >>> 09/02/19 17:56:27 INFO mapred.JobClient: HDFS bytes written=3D175 > >>> > >>> Appreciate any help on this. > >>> > >>> Thanks > >>> > >>> Jagadesh > >>> > >>> DISCLAIMER: > >>> This email (including any attachments) is intended for the sole use o= f > the > >>> intended recipient/s and may contain material that is CONFIDENTIAL AN= D > >>> PRIVATE > >>> COMPANY INFORMATION. Any review or reliance by others or copying or > >>> distribution or forwarding of any or all of the contents in this > message is > >>> STRICTLY PROHIBITED. If you are not the intended recipient, please > contact > >>> the > >>> sender by email and delete all copies; your cooperation in this regar= d > is > >>> appreciated. > >>> > >> > >> DISCLAIMER: > >> This email (including any attachments) is intended for the sole use of > the > >> intended recipient/s and may contain material that is CONFIDENTIAL AND > PRIVATE > >> COMPANY INFORMATION. Any review or reliance by others or copying or > >> distribution or forwarding of any or all of the contents in this messa= ge > is > >> STRICTLY PROHIBITED. If you are not the intended recipient, please > contact the > >> sender by email and delete all copies; your cooperation in this regard > is > >> appreciated. > >> > > > > > > > > DISCLAIMER: > > This email (including any attachments) is intended for the sole use of > the intended recipient/s and may contain material that is CONFIDENTIAL AN= D > PRIVATE COMPANY INFORMATION. Any review or reliance by others or copying = or > distribution or forwarding of any or all of the contents in this message = is > STRICTLY PROHIBITED. If you are not the intended recipient, please contac= t > the sender by email and delete all copies; your cooperation in this regar= d > is appreciated. > > > > > > DISCLAIMER: > This email (including any attachments) is intended for the sole use of th= e > intended recipient/s and may contain material that is CONFIDENTIAL AND > PRIVATE COMPANY INFORMATION. Any review or reliance by others or copying = or > distribution or forwarding of any or all of the contents in this message = is > STRICTLY PROHIBITED. If you are not the intended recipient, please contac= t > the sender by email and delete all copies; your cooperation in this regar= d > is appreciated. > DISCLAIMER: This email (including any attachments) is intended for the sole use of the = intended recipient/s and may contain material that is CONFIDENTIAL AND PRIV= ATE COMPANY INFORMATION. Any review or reliance by others or copying or dis= tribution or forwarding of any or all of the contents in this message is ST= RICTLY PROHIBITED. If you are not the intended recipient, please contact th= e sender by email and delete all copies; your cooperation in this regard is= appreciated.