hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Failed reduce job in some node
Date Sun, 12 Aug 2012 13:27:15 GMT
Hi Owen,

That your reducer seems to make some maps re-run during its shuffle
(copy) phase is suggestive of DNS issues. Can you ensure that in your
cluster, one node can fully resolve the others' hostname to the right
IP, and have identical /etc/hosts files (if you're using file-based
lookups)?

On Thu, Aug 9, 2012 at 6:22 PM, Owen Duan <sudyduan@gmail.com> wrote:
> Summary:  Failed reduce job
> Hadoop Versions: 1.0.3
> Environment: hadoop-1.0.3 JDK_1.6.0_24  Ubuntu-11.04
> Description: when I try to run page rank algorithm in python with  three
> nodes cluster using hadoop streaming, the reduce job hang at 16%, the job
> finished after a long time. As I check the jobtracker and find that all the
> reduce tasks are done in one single node. Here is the map and reduce file.
>
> hduser@ubuntu:hadoop jar hadoop-streaming-1.0.3.jar -mapper
> ~/mapreduce/PageMap.py -reducer ~/mapreduce/PageReduce.py -input
> /pageValue10 -output /pageOut
> Warning: $HADOOP_HOME is deprecated.
>
> packageJobJar: [/app/hadoop/tmp/hadoop-unjar595645452276364982/] []
> /tmp/streamjob2716673000685326551.jar tmpDir=null
> 12/08/09 20:25:07 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library
> 12/08/09 20:25:07 WARN snappy.LoadSnappy: Snappy native library not loaded
> 12/08/09 20:25:07 INFO mapred.FileInputFormat: Total input paths to process
> : 1
> 12/08/09 20:25:07 INFO streaming.StreamJob: getLocalDirs():
> [/app/hadoop/tmp/mapred/local]
> 12/08/09 20:25:07 INFO streaming.StreamJob: Running job:
> job_201208092013_0002
> 12/08/09 20:25:07 INFO streaming.StreamJob: To kill this job, run:
> 12/08/09 20:25:07 INFO streaming.StreamJob:
> /home/hduser/hadoop/libexec/../bin/hadoop job
> -Dmapred.job.tracker=master:54311 -kill job_201208092013_0002
> 12/08/09 20:25:07 INFO streaming.StreamJob: Tracking URL:
> http://master:50030/jobdetails.jsp?jobid=job_201208092013_0002
> 12/08/09 20:25:08 INFO streaming.StreamJob:  map 0%  reduce 0%
> 12/08/09 20:25:24 INFO streaming.StreamJob:  map 18%  reduce 0%
> 12/08/09 20:25:27 INFO streaming.StreamJob:  map 47%  reduce 0%
> 12/08/09 20:25:30 INFO streaming.StreamJob:  map 62%  reduce 0%
> 12/08/09 20:25:33 INFO streaming.StreamJob:  map 67%  reduce 0%
> 12/08/09 20:25:42 INFO streaming.StreamJob:  map 67%  reduce 2%
> 12/08/09 20:25:45 INFO streaming.StreamJob:  map 78%  reduce 6%
> 12/08/09 20:25:48 INFO streaming.StreamJob:  map 85%  reduce 8%
> 12/08/09 20:25:51 INFO streaming.StreamJob:  map 90%  reduce 9%
> 12/08/09 20:25:54 INFO streaming.StreamJob:  map 95%  reduce 9%
> 12/08/09 20:25:57 INFO streaming.StreamJob:  map 99%  reduce 9%
> 12/08/09 20:26:00 INFO streaming.StreamJob:  map 100%  reduce 9%
> 12/08/09 20:26:12 INFO streaming.StreamJob:  map 100%  reduce 13%
> 12/08/09 20:32:06 INFO streaming.StreamJob:  map 83%  reduce 13%
> 12/08/09 20:32:12 INFO streaming.StreamJob:  map 98%  reduce 13%
> 12/08/09 20:32:15 INFO streaming.StreamJob:  map 100%  reduce 13%
> 12/08/09 20:32:27 INFO streaming.StreamJob:  map 100%  reduce 16%
> 12/08/09 20:38:42 INFO streaming.StreamJob:  map 83%  reduce 16%
> 12/08/09 20:38:48 INFO streaming.StreamJob:  map 98%  reduce 16%
> 12/08/09 20:38:51 INFO streaming.StreamJob:  map 100%  reduce 16%
> 12/08/09 20:39:00 INFO streaming.StreamJob:  map 100%  reduce 17%
> 12/08/09 20:39:03 INFO streaming.StreamJob:  map 100%  reduce 25%
> 12/08/09 20:39:06 INFO streaming.StreamJob:  map 100%  reduce 36%
> 12/08/09 20:39:09 INFO streaming.StreamJob:  map 100%  reduce 40%
> 12/08/09 20:39:13 INFO streaming.StreamJob:  map 100%  reduce 42%
> 12/08/09 20:39:16 INFO streaming.StreamJob:  map 100%  reduce 44%
> 12/08/09 20:39:19 INFO streaming.StreamJob:  map 100%  reduce 51%
> 12/08/09 20:39:22 INFO streaming.StreamJob:  map 100%  reduce 61%
> 12/08/09 20:39:25 INFO streaming.StreamJob:  map 100%  reduce 66%
> 12/08/09 20:39:28 INFO streaming.StreamJob:  map 100%  reduce 77%
> 12/08/09 20:39:31 INFO streaming.StreamJob:  map 100%  reduce 82%
> 12/08/09 20:39:43 INFO streaming.StreamJob:  map 100%  reduce 86%
> 12/08/09 20:39:46 INFO streaming.StreamJob:  map 100%  reduce 93%
> 12/08/09 20:39:49 INFO streaming.StreamJob:  map 100%  reduce 98%
> 12/08/09 20:39:55 INFO streaming.StreamJob:  map 100%  reduce 100%
> 12/08/09 20:40:01 INFO streaming.StreamJob: Job complete:
> job_201208092013_0002
> 12/08/09 20:40:01 INFO streaming.StreamJob: Output: /pageOut
>
>
> the map file
>
> #!/usr/bin/env python
> #encoding=utf-8
>
> import sys
>
>
> if __name__ == "__main__":
>
>     for line in sys.stdin:
>         line = line.rstrip()
>         data = line.split()
>
>         #initial pagerank value
>         pr = float(data[1])
>
>         #number of sites that linked
>         count = len(line)-2
>
>         #avg pr
>         avgpr =  pr/count
>
>         for term in data[2:]:
>             print term + "  @" + str(avgpr)
>             print data[0] + "  &" + term
>
>
> the reduce file:
>
> #!/usr/bin/env python
> #encoding=utf-8
>
> import sys
>
>
> if __name__ == "__main__":
>
>
>     #store the links
>     linkDict = {}
>
>     #value from other website
>     valueDict = {}
>
>     for line in sys.stdin:
>         line = line.rstrip()
>         term = line.split()
>
>
>         if term[1][0] == '@':
>             if valueDict.has_key(term[0]) == False:
>                 valueDict[term[0]] = list()
>
>             valueDict[term[0]].append(float(term[1][1:]))
>
>         elif term[1][0] == '&':
>             if linkDict.has_key(term[0]) == False:
>                 linkDict[term[0]] = list()
>             linkDict[term[0]].append(term[1][1:])
>
>     for key in valueDict.keys():
>
>         totalpr = 0.0;
>         strTmp = ""
>
>         for item in valueDict[key]:
>             totalpr += item
>
>
>         totalpr = 0.85*totalpr + 0.15
>         strTmp = key + " " + str(totalpr)
>
> 	try:
>             for k in linkDict[key]:
>                 strTmp += " " + str(k)
>             print strTmp
>      	except:
> 	    pass
>
> I get this when I submit a simple map-reduce job using streaming. Even the
> word count example failed to reduce in some nodes. The /etc/host file and
> configuration information in the conf/core-site.xml ,mapred-site.xml and
> hdfs-site.xml log file of the master and slave nodes are attached with this
> e-mail.
>
> /etc/hosts
>
> 172.29.142.240  master
> 172.29.142.213  slaveorange
> 172.29.142.222  slavecc
>
> Any help would be appreciated!



-- 
Harsh J

Mime
View raw message