hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Guedes <ps-gue...@criticalsoftware.com>
Subject Re: Hadoop WordCount hanging on reduce stage
Date Tue, 03 Apr 2007 12:31:29 GMT
Seems I was looking in the wrong log file :) ... was looking at the
tasktracker when i should be looking underneath!

It was a problem with the HDFS breaking because the machines couldn't
find each other... they are configured with IP's in hadoop-site.xml but
when the cluster is running they (somehow) try to resolve each others
hostnames... Know why?

Fixed it by adding the nodes hostnames to each others /etc/hosts...

Pedro

Pedro Guedes wrote:
> Well, moving to 0.11.2 won't fix it... tried that!
>
> The first interesting thing in the log is:
> 2007-04-02 15:45:41,960 WARN org.apache.hadoop.mapred.TaskRunner:
> java.io.IOException: File
> /home/ciclope/hadoop-install/hadoop-data/mapred/local/task_0001_r_000001_0/map_10.out-0
> not created
>     at
> org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.copyOutput(ReduceTaskRunner.java:282)
>     at
> org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.run(ReduceTaskRunner.java:243)
>
> And the tasktracker of the slave node keeps repeating himself with:
>
> 2007-04-02 15:47:03,089 INFO org.apache.hadoop.mapred.TaskRunner:
> task_0001_r_000003_0 Need 12 map output(s)
> 2007-04-02 15:47:03,089 INFO org.apache.hadoop.mapred.TaskRunner:
> task_0001_r_000003_0 Got 12 known map output location(s); scheduling...
> 2007-04-02 15:47:03,089 INFO org.apache.hadoop.mapred.TaskRunner:
> task_0001_r_000003_0 Scheduled 0 of 12 known outputs (12 slow hosts and
> 0 dup hosts)
> 2007-04-02 15:47:03,273 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000001_0 0.14285715% reduce > copy (9 of 21 at 0.00 MB/s) >
> 2007-04-02 15:47:03,969 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000003_0 0.14285715% reduce > copy (9 of 21 at 0.00 MB/s) >
> 2007-04-02 15:47:04,277 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000001_0 0.14285715% reduce > copy (9 of 21 at 0.00 MB/s) >
> 2007-04-02 15:47:04,973 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000003_0 0.14285715% reduce > copy (9 of 21 at 0.00 MB/s) >
> 2007-04-02 15:47:05,281 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000001_0 0.14285715% reduce > copy (9 of 21 at 0.00 MB/s) >
> 2007-04-02 15:47:05,977 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000003_0 0.14285715% reduce > copy (9 of 21 at 0.00 MB/s) >
> 2007-04-02 15:47:06,285 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000001_0 0.14285715% reduce > copy (9 of 21 at 0.00 MB/s) >
> 2007-04-02 15:47:06,981 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000003_0 0.14285715% reduce > copy (9 of 21 at 0.00 MB/s) >
>
>
> Pedro Guedes wrote:
>   
>> Hi hadooping people...
>>
>> I'm having trouble running the wordcount example with hadoop... i ran it
>> ok with only one host but when i add another machine to the cluster...
>> it falls apart! :(
>>
>> I read in the malling-list archive about someone having a similar
>> problem but the proposed solution was to downgrade to 0.11.2 (from
>> 0.12.0, I'm using 0.12.2)... is that right? A reference here:
>> http://www.mail-archive.com/hadoop-user@lucene.apache.org/msg00863.html
>>
>> The only difference in my case is that mine hangs around 60% of the
>> reduce phase... but the tasktracker for the slave node shows the same
>> 'IOException: 'file .....mapx_out not created' and that's the only error
>> i see...
>>
>> Any sugestions?
>>
>> thanks in advance...
>>
>> Pedro
>>
>>   
>>     
>
>
>   


Mime
View raw message