hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das" <d...@yahoo-inc.com>
Subject RE: Reduce hangs
Date Sat, 19 Jan 2008 06:02:15 GMT
Hi Yunhong,
As per the output it seems the job ran to successful completion (albeit with
some failures)... 
Devaraj 

> -----Original Message-----
> From: Yunhong Gu1 [mailto:ygu1@cs.uic.edu] 
> Sent: Saturday, January 19, 2008 8:56 AM
> To: hadoop-user@lucene.apache.org
> Subject: Re: Reduce hangs
> 
> 
> 
> Yes, it looks like HADOOP-1374
> 
> The program actually failed after a while:
> 
> 
> gu@ncdm-8:~/hadoop-0.15.2$ ./bin/hadoop jar 
> hadoop-0.15.2-test.jar mrbench
> MRBenchmark.0.0.2
> 08/01/18 18:53:08 INFO mapred.MRBench: creating control file: 
> 1 numLines, ASCENDING sortOrder
> 08/01/18 18:53:08 INFO mapred.MRBench: created control file: 
> /benchmarks/MRBench/mr_input/input_-450753747.txt
> 08/01/18 18:53:09 INFO mapred.MRBench: Running job 0: 
> input=/benchmarks/MRBench/mr_input
> output=/benchmarks/MRBench/mr_output/output_1843693325
> 08/01/18 18:53:09 INFO mapred.FileInputFormat: Total input 
> paths to process : 1
> 08/01/18 18:53:09 INFO mapred.JobClient: Running job: 
> job_200801181852_0001
> 08/01/18 18:53:10 INFO mapred.JobClient:  map 0% reduce 0%
> 08/01/18 18:53:17 INFO mapred.JobClient:  map 100% reduce 0%
> 08/01/18 18:53:25 INFO mapred.JobClient:  map 100% reduce 16%
> 08/01/18 19:08:27 INFO mapred.JobClient: Task Id : 
> task_200801181852_0001_m_000001_0, Status : FAILED Too many 
> fetch-failures
> 08/01/18 19:08:27 WARN mapred.JobClient: Error reading task 
> outputncdm15
> 08/01/18 19:08:27 WARN mapred.JobClient: Error reading task 
> outputncdm15
> 08/01/18 19:08:34 INFO mapred.JobClient:  map 100% reduce 100%
> 08/01/18 19:08:35 INFO mapred.JobClient: Job complete: 
> job_200801181852_0001
> 08/01/18 19:08:35 INFO mapred.JobClient: Counters: 10
> 08/01/18 19:08:35 INFO mapred.JobClient:   Job Counters
> 08/01/18 19:08:35 INFO mapred.JobClient:     Launched map tasks=3
> 08/01/18 19:08:35 INFO mapred.JobClient:     Launched reduce tasks=1
> 08/01/18 19:08:35 INFO mapred.JobClient:     Data-local map tasks=2
> 08/01/18 19:08:35 INFO mapred.JobClient:   Map-Reduce Framework
> 08/01/18 19:08:35 INFO mapred.JobClient:     Map input records=1
> 08/01/18 19:08:35 INFO mapred.JobClient:     Map output records=1
> 08/01/18 19:08:35 INFO mapred.JobClient:     Map input bytes=2
> 08/01/18 19:08:35 INFO mapred.JobClient:     Map output bytes=5
> 08/01/18 19:08:35 INFO mapred.JobClient:     Reduce input groups=1
> 08/01/18 19:08:35 INFO mapred.JobClient:     Reduce input records=1
> 08/01/18 19:08:35 INFO mapred.JobClient:     Reduce output records=1
> DataLines       Maps    Reduces AvgTime (milliseconds)
> 1               2       1       926333
> 
> 
> 
> On Fri, 18 Jan 2008, Konstantin Shvachko wrote:
> 
> > Looks like we still have this unsolved mysterious problem:
> >
> > http://issues.apache.org/jira/browse/HADOOP-1374
> >
> > Could it be related to HADOOP-1246? Arun?
> >
> > Thanks,
> > --Konstantin
> >
> > Yunhong Gu1 wrote:
> >> 
> >> Hi,
> >> 
> >> If someone knows how to fix the problem described below, 
> please help 
> >> me out. Thanks!
> >> 
> >> I am testing Hadoop on 2-node cluster and the "reduce" 
> always hangs 
> >> at some stage, even if I use different clusters. My OS is Debian 
> >> Linux kernel 2.6 (AMD Opteron w/ 4GB Mem). Hadoop verision 
> is 0.15.2. 
> >> Java version is 1.5.0_01-b08.
> >> 
> >> I simply tried "./bin/hadoop jar hadoop-0.15.2-test.jar 
> mrbench" and 
> >> when the map stage finishes, the reduce stage will hang 
> somewhere in 
> >> the middle, sometimes at 0%. I also tried any other 
> mapreduce program 
> >> I can find in the example jar package but they all hang.
> >> 
> >> The log file simply print
> >> 2008-01-18 15:15:50,831 INFO org.apache.hadoop.mapred.TaskTracker: 
> >> task_200801181424_0004_r_000000_0 0.0% reduce > copy >
> >> 2008-01-18 15:15:56,841 INFO org.apache.hadoop.mapred.TaskTracker: 
> >> task_200801181424_0004_r_000000_0 0.0% reduce > copy >
> >> 2008-01-18 15:16:02,850 INFO org.apache.hadoop.mapred.TaskTracker: 
> >> task_200801181424_0004_r_000000_0 0.0% reduce > copy >
> >> 
> >> forever.
> >> 
> >> The program does work if I start Hadoop only on single node.
> >> 
> >> Below is my hadoop-site.xml configuration:
> >> 
> >> <configuration>
> >> 
> >> <property>
> >>    <name>fs.default.name</name>
> >>    <value>10.0.0.1:60000</value>
> >> </property>
> >> 
> >> <property>
> >>    <name>mapred.job.tracker</name>
> >>    <value>10.0.0.1:60001</value>
> >> </property>
> >> 
> >> <property>
> >>    <name>dfs.data.dir</name>
> >>    <value>/raid/hadoop/data</value>
> >> </property>
> >> 
> >> <property>
> >>    <name>mapred.local.dir</name>
> >>    <value>/raid/hadoop/mapred</value>
> >> </property>
> >> 
> >> <property>
> >>   <name>hadoop.tmp.dir</name>
> >>   <value>/raid/hadoop/tmp</value>
> >> </property>
> >> 
> >> <property>
> >>   <name>mapred.child.java.opts</name>
> >>   <value>-Xmx1024m</value>
> >> </property>
> >> 
> >> <property>
> >>   <name>mapred.tasktracker.tasks.maximum</name>
> >>   <value>4</value>
> >> </property>
> >> 
> >> <!--
> >> <property>
> >>   <name>mapred.map.tasks</name>
> >>   <value>7</value>
> >> </property>
> >> 
> >> <property>
> >>   <name>mapred.reduce.tasks</name>
> >>   <value>3</value>
> >> </property>
> >> -->
> >> 
> >> <property>
> >>   <name>fs.inmemory.size.mb</name>
> >>   <value>200</value>
> >> </property>
> >> 
> >> <property>
> >>   <name>dfs.block.size</name>
> >>   <value>134217728</value>
> >> </property>
> >> 
> >> <property>
> >>   <name>io.sort.factor</name>
> >>   <value>100</value>
> >> </property>
> >> 
> >> <property>
> >>   <name>io.sort.mb</name>
> >>   <value>200</value>
> >> </property>
> >> 
> >> <property>
> >>   <name>io.file.buffer.size</name>
> >>   <value>131072</value>
> >> </property>
> >> 
> >> </configuration>
> >> 
> >> 
> >
> 


Mime
View raw message