Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7D112CF1B for ; Thu, 20 Mar 2014 08:47:47 +0000 (UTC) Received: (qmail 9627 invoked by uid 500); 20 Mar 2014 08:47:38 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 9207 invoked by uid 500); 20 Mar 2014 08:47:35 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 9186 invoked by uid 99); 20 Mar 2014 08:47:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Mar 2014 08:47:33 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of harsh@cloudera.com designates 209.85.213.182 as permitted sender) Received: from [209.85.213.182] (HELO mail-ig0-f182.google.com) (209.85.213.182) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Mar 2014 08:47:28 +0000 Received: by mail-ig0-f182.google.com with SMTP id uy17so1440423igb.3 for ; Thu, 20 Mar 2014 01:47:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=6GnFosju8KXcl8kJVBcEyJ454turmHPmXThlblNmVS8=; b=gE1SB1sJfRPxDl1gz2K+XKk/gz4L393tpsxFSTq6j6xqV/uJcMm1aJoE/jvgdq2ei4 DFT4Xdt5O/OEcsyLlX2aSdDofnFflglUNMP4fk/jrELldI4t596ZSKq7zF54TFVJlsIe FMYHMUImoYhoA4rTOSaR3LW08qBfkK0imjtGubk1HcaRL6YXCQqrWDr9cmStNfm4M3lF On5KDNWS1QFj8d1w2P1wv40NVizX5EYccfmMGkN7GrmlqD8iyWxemTQzUZLa83DRfToW GXyxtTGtQ5PxEEKxlLUphzu/GRQQOJvJD4F1qtbcsp4EMJTjEA7zzfiFHOB/AbTeUBHE Qvhg== X-Gm-Message-State: ALoCoQnCDaBxV1LXmHy5Fzqcf9SJTk9stKntn1xG4/MYmOQe9XRO3DAzWxKNL4ZepDM9UQ0Z5tmN X-Received: by 10.51.15.195 with SMTP id fq3mr29582721igd.5.1395305228210; Thu, 20 Mar 2014 01:47:08 -0700 (PDT) MIME-Version: 1.0 Received: by 10.50.61.97 with HTTP; Thu, 20 Mar 2014 01:46:48 -0700 (PDT) In-Reply-To: <1395299953.56884.YahooMailNeo@web141604.mail.bf1.yahoo.com> References: <1395257238.29087.YahooMailNeo@web141601.mail.bf1.yahoo.com> <1395299953.56884.YahooMailNeo@web141604.mail.bf1.yahoo.com> From: Harsh J Date: Thu, 20 Mar 2014 14:16:48 +0530 Message-ID: Subject: Re: The reduce copier failed To: "" , Mahmood Naderan Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org At the end it says clearly that the job has failed. On Thu, Mar 20, 2014 at 12:49 PM, Mahmood Naderan wrote: > After multiple messages, it says that the job has been completed. I really > wonder if the job has been truly completed or failed. > > 14/03/20 03:49:04 INFO mapred.JobClient: map 50% reduce 0% > 14/03/20 03:49:20 INFO mapred.JobClient: Job complete: job_201403191916_0001 > 14/03/20 03:49:20 INFO mapred.JobClient: Counters: 20 > 14/03/20 03:49:20 INFO mapred.JobClient: Job Counters > 14/03/20 03:49:20 INFO mapred.JobClient: Launched reduce tasks=4 > 14/03/20 03:49:20 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=121826447 > 14/03/20 03:49:20 INFO mapred.JobClient: Total time spent by all reduces > waiting after reserving slots (ms)=0 > 14/03/20 03:49:20 INFO mapred.JobClient: Total time spent by all maps > waiting after reserving slots (ms)=0 > 14/03/20 03:49:20 INFO mapred.JobClient: Launched map tasks=357 > 14/03/20 03:49:20 INFO mapred.JobClient: Data-local map tasks=357 > 14/03/20 03:49:20 INFO mapred.JobClient: Failed reduce tasks=1 > 14/03/20 03:49:20 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=27097157 > 14/03/20 03:49:20 INFO mapred.JobClient: FileSystemCounters > 14/03/20 03:49:20 INFO mapred.JobClient: HDFS_BYTES_READ=23648804348 > 14/03/20 03:49:20 INFO mapred.JobClient: FILE_BYTES_WRITTEN=4320784806 > 14/03/20 03:49:20 INFO mapred.JobClient: File Input Format Counters > 14/03/20 03:49:20 INFO mapred.JobClient: Bytes Read=23648753804 > 14/03/20 03:49:20 INFO mapred.JobClient: Map-Reduce Framework > 14/03/20 03:49:20 INFO mapred.JobClient: Map output materialized > bytes=4300573634 > 14/03/20 03:49:20 INFO mapred.JobClient: Combine output records=0 > 14/03/20 03:49:20 INFO mapred.JobClient: Map input records=7131117 > 14/03/20 03:49:20 INFO mapred.JobClient: Spilled Records=903190 > 14/03/20 03:49:20 INFO mapred.JobClient: Map output bytes=4296978520 > 14/03/20 03:49:20 INFO mapred.JobClient: Total committed heap usage > (bytes)=62965284864 > 14/03/20 03:49:20 INFO mapred.JobClient: Combine input records=0 > 14/03/20 03:49:20 INFO mapred.JobClient: Map output records=903190 > 14/03/20 03:49:20 INFO mapred.JobClient: SPLIT_RAW_BYTES=45981 > Exception in thread "main" java.lang.IllegalStateException: Job failed! > at > org.apache.mahout.text.wikipedia.WikipediaDatasetCreatorDriver.runJob(WikipediaDatasetCreatorDriver.java:187) > at > org.apache.mahout.text.wikipedia.WikipediaDatasetCreatorDriver.main(WikipediaDatasetCreatorDriver.java:115) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at org.apache.hadoop.util.RunJar.main(RunJar.java:160) > > > Regards, > Mahmood > > > On Thursday, March 20, 2014 3:41 AM, Harsh J wrote: > While it does mean a retry, if the job eventually fails (after finite > retries all fail as well), then you have a problem to investigate. If > the job eventually succeeded, then this may have been a transient > issue. Worth investigating either way. > > On Thu, Mar 20, 2014 at 12:57 AM, Mahmood Naderan > wrote: >> Hi >> In the middle of a map-reduce job I get >> >> map 20% reduce 6% >> ... >> The reduce copier failed >> .... >> map 20% reduce 0% >> map 20% reduce 1% >> map 20% reduce 2% >> map 20% reduce 3% >> >> >> Does that imply a *retry* process? Or I have to be worried about that >> message? >> >> Regards, >> Mahmood > > > > > -- > Harsh J > -- Harsh J