hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <paul.roge...@gmail.com>
Subject Re: Issue with map reduce on examples
Date Wed, 20 Nov 2013 15:10:12 GMT
UPDATE

I think I have some more info.  If I look at the running reduce task

    (
http://localhost:50030/taskdetails.jsp?tipid=task_201311201256_0001_r_000000
)



I see it is assigned to machine /default-rack/hit-nxdomain.opendns.com

If I then try and click on the "Last 4KB Task Logs" link it sends me to


http://hit-nxdomain.opendns.com:50060/tasklog?attemptid=attempt_201311201256_0001_r_000000_0&start=-4097

amending this URL to


http://localhost:50060/tasklog?attemptid=attempt_201311201256_0001_r_000000_0&start=-4097

then shows the log with many examples of the following:

    2013-11-20 14:59:54,726 INFO org.apache.hadoop.mapred.ReduceTask:
Penalized(slow) Hosts:
    2013-11-20 14:59:54,726 INFO org.apache.hadoop.mapred.ReduceTask:
hit-nxdomain.opendns.com Will be considered after: 814 seconds.
    2013-11-20 15:00:54,729 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201311201256_0001_r_000000_0 Need another 4 map output(s) where 0
is already in progress
    2013-11-20 15:00:54,729 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201311201256_0001_r_000000_0 Scheduled 0 outputs (1 slow hosts and0
dup hosts)
    2013-11-20 15:00:54,730 INFO org.apache.hadoop.mapred.ReduceTask:
Penalized(slow) Hosts:
    2013-11-20 15:00:54,730 INFO org.apache.hadoop.mapred.ReduceTask:
hit-nxdomain.opendns.com Will be considered after: 754 seconds.

So it seems that hadoop thinks the task is running on the
hit-nxdomain.opendns.com host.

The host (localhost) picks it's DNS settings up via DHCP with the router
set as the DNS server.  The router in turn uses opendns.com to resolve
external addresses.

Am I right in thinking this is therefore a DNS issue?

Any idea how hadoop has ended up with this host name?

Any idea how to fix it?

Many thanks


Paul


On 18 November 2013 12:53, Paul Rogers <paul.rogers6@gmail.com> wrote:

> Hi All
>
> Having some problems with map reduce running in pseudo-distributed mode.
>  I am running version 1.2.1 on linux.  I have:
> 1. created $JAVA_HOME & $HADOOP_HOME and added the relative bin
> directories to the path;
> 2. Formatted the dfs;
> 3. executed start-dfs.sh and start-mapred.sh.
>
> Executing jps seems to show everything running that should be running (I
> think).
>
> [paul@lt001 bin]$ jps
> 8724 TaskTracker
> 8487 SecondaryNameNode
> 8841 Jps
> 8353 DataNode
> 7239 NameNode
> 8597 JobTracker
>
> I have then tried to run the wordcount and pi examples with similar
> results, eg:
>
> [paul@lt001 bin]$ hadoop jar hadoop/hadoop-examples-1.2.1.jar pi 4 1000
> Warning: $HADOOP_HOME is deprecated.
>
> Number of Maps  = 4
> Samples per Map = 1000
> Wrote input for Map #0
> Wrote input for Map #1
> Wrote input for Map #2
> Wrote input for Map #3
> Starting Job
> 13/11/18 10:31:38 INFO mapred.FileInputFormat: Total input paths to
> process : 4
> 13/11/18 10:31:39 INFO mapred.JobClient: Running job: job_201311181028_0001
> 13/11/18 10:31:40 INFO mapred.JobClient:  map 0% reduce 0%
> 13/11/18 10:31:47 INFO mapred.JobClient:  map 50% reduce 0%
> 13/11/18 10:31:52 INFO mapred.JobClient:  map 100% reduce 0%
>
> In each instance the output reaches the map 100% reduce 0% stage then
> stalls.  No matter how long I wait the job does not advance any further.  I
> have checked the logs and the one I suspect is indicating the problem
> is hadoop-paul-tasktracker-lt001.log which has the following output:
>
> 2013-11-18 10:31:55,969 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201311181028_0001_r_000000_0 0.0% reduce > copy >
> 2013-11-18 10:34:59,148 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201311181028_0001_r_000000_0 0.0% reduce > copy >
> 2013-11-18 10:35:05,196 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201311181028_0001_r_000000_0 0.0% reduce > copy >
> 2013-11-18 10:35:11,253 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201311181028_0001_r_000000_0 0.0% reduce > copy >
>
> ..........
>
> 2013-11-18 11:10:03,259 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201311181028_0001_r_000000_0 0.0% reduce > copy >
> 2013-11-18 11:10:06,290 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201311181028_0001_r_000000_0 0.0% reduce > copy >
> 2013-11-18 11:10:12,320 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201311181028_0001_r_000000_0 0.0% reduce > copy >
> 2013-11-18 11:10:18,343 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201311181028_0001_r_000000_0 0.0% reduce > copy >
> 2013-11-18 11:10:21,369 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201311181028_0001_r_000000_0 0.0% reduce > copy >
> 2013-11-18 11:10:27,395 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201311181028_0001_r_000000_0 0.0% reduce > copy >
> 2013-11-18 11:10:33,426 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201311181028_0001_r_000000_0 0.0% reduce > copy >
> 2013-11-18 11:10:36,463 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201311181028_0001_r_000000_0 0.0% reduce > copy >
>
> It seems it is stuck on reduce > copy > but why?  Can anyone help with
> where to look next?
>
> Many thanks
>
>
> Paul
>

Mime
View raw message