hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Kimball <aa...@cloudera.com>
Subject Re: Error reading task output
Date Sat, 11 Jul 2009 00:42:55 GMT
Huh. If you look at the JobTracker or TaskTracker log files, do they start
getting any WARN or ERROR lines around the time jobs start to fail?
- Aaron

On Fri, Jul 10, 2009 at 5:19 AM, Ian jonhson <jonhson.ian@gmail.com> wrote:

> On Thu, Jul 9, 2009 at 5:41 AM, Aaron Kimball<aaron@cloudera.com> wrote:
> > Hmmm... The way Linux resolves a DNS entry is to first read through the
> file
> > in /etc/hosts and try to match a name in there. If this fails, then it
> > contacts an external DNS server for the lookup.
> >
> > The /etc/hosts file contains lines of the form:
> > IP_ADDR   NAME [NAME NAME NAME....]
> >
> > so you might see:
> > 127.0.0.1    wombat localhost localhost.localdomain
> >
> > ... assuming your computer's name is "wombat."
> >
> > When a server registers with the NameNode/JobTracker as ready for work,
> it
> > provides its DNS name to the NN/JT. It does this "reverse lookup" by
> > figuring out its own IP address (which likely will report 127.0.0.1) and
> > then picking the first name on the line.
> >
> > So if you've got the line:
> >
> > 127.0.0.1 localhost wombat
> >
> > in your /etc/hosts file, change that around to:
> >
> > 127.0.0.1 wombat localhost
> >
> >
> > That having been said, if this problem only happens for you after jobs
> have
> > been running a while, it's likely that DNS isn't your issue. What exact
> > error messages are showing up in your log? What hadoop version are you
> > running?
> >
>
> I am not sure what  the reason is and I don't know how to solve the
> trouble.
> when i restart the hadoop, all is ok and I can run simple job such as
> wordcount.
>  However, when I run multiple jobs, the later submitted jobs will
> throw the error.
> After that time, all the newly submitted jobs all meet the error.
>
> My /etc/hosts is as follows:
>
> ------------------------------  cat /etc/hosts
> -----------------------------
> $ cat /etc/hosts
> # Do not remove the following line, or various programs
> # that require network functionality will fail.
> 127.0.0.1       localhost.localdomain   localhost
> ::1     localhost6.localdomain6 localhost6
>
>
> 10.61.0.143  hdt1.hyperdomain  hdt1.hypercloud.ict
> 10.61.0.7  hdt2.hyperdomain  hdt2.hypercloud.ict
> 10.61.0.5  hdt0.hyperdomain  hdt0.hypercloud.ict
>
> ----------------------------------------------------------------------------------
>
>
> And, when met the error, the message thrown out by wordcount is:
>
> -------------------------  dump of screen -------------------------------
> $ ./bin/hadoop jar hadoop-0.19.3-64bit-examples.jar wordcount testfile
> output
> 09/07/10 20:03:33 INFO util.Shell: Shell [anqin2] commands :
> [Ljava.lang.String;@5492bbba
> 09/07/10 20:03:33 INFO util.Shell: Shell[anqin2] execString : [whoami]
> 09/07/10 20:03:33 INFO util.Shell: Shell [anqin2] commands :
> [Ljava.lang.String;@2b5ac3c9
> 09/07/10 20:03:33 INFO util.Shell: Shell[anqin2] execString : [bash, -c,
> groups]
> 09/07/10 20:03:33 INFO util.Shell: Shell [anqin2] commands :
> [Ljava.lang.String;@851052d
> 09/07/10 20:03:33 INFO util.Shell: Shell[anqin2] execString : [whoami]
> 09/07/10 20:03:33 INFO mapred.FileInputFormat: Total input paths to process
> : 1
> 09/07/10 20:03:33 INFO mapred.JobClient: Running job: job_200907051329_0016
> 09/07/10 20:03:34 INFO mapred.JobClient:  map 0% reduce 0%
> 09/07/10 20:03:40 INFO mapred.JobClient: Task Id :
> attempt_200907051329_0016_m_000004_0, Status : FAILED
> 09/07/10 20:03:40 WARN mapred.JobClient: Error reading task
>
> outputhttp://hdt1.hyperdomain:50060/tasklog?plaintext=true&taskid=attempt_200907051329_0016_m_000004_0&filter=stdout
> 09/07/10 20:03:40 WARN mapred.JobClient: Error reading task
>
> outputhttp://hdt1.hyperdomain:50060/tasklog?plaintext=true&taskid=attempt_200907051329_0016_m_000004_0&filter=stderr
> 09/07/10 20:03:43 INFO mapred.JobClient: Task Id :
> attempt_200907051329_0016_m_000004_1, Status : FAILED
> 09/07/10 20:03:43 WARN mapred.JobClient: Error reading task
>
> outputhttp://hdt1.hyperdomain:50060/tasklog?plaintext=true&taskid=attempt_200907051329_0016_m_000004_1&filter=stdout
> 09/07/10 20:03:43 WARN mapred.JobClient: Error reading task
>
> outputhttp://hdt1.hyperdomain:50060/tasklog?plaintext=true&taskid=attempt_200907051329_0016_m_000004_1&filter=stderr
> 09/07/10 20:03:46 INFO mapred.JobClient: Task Id :
> attempt_200907051329_0016_m_000004_2, Status : FAILED
> 09/07/10 20:03:46 WARN mapred.JobClient: Error reading task
>
> outputhttp://hdt1.hyperdomain:50060/tasklog?plaintext=true&taskid=attempt_200907051329_0016_m_000004_2&filter=stdout
> 09/07/10 20:03:46 WARN mapred.JobClient: Error reading task
>
> outputhttp://hdt1.hyperdomain:50060/tasklog?plaintext=true&taskid=attempt_200907051329_0016_m_000004_2&filter=stderr
> 09/07/10 20:03:52 INFO mapred.JobClient: Task Id :
> attempt_200907051329_0016_m_000003_0, Status : FAILED
> 09/07/10 20:03:53 WARN mapred.JobClient: Error reading task
>
> outputhttp://hdt1.hyperdomain:50060/tasklog?plaintext=true&taskid=attempt_200907051329_0016_m_000003_0&filter=stdout
> 09/07/10 20:03:53 WARN mapred.JobClient: Error reading task
>
> outputhttp://hdt1.hyperdomain:50060/tasklog?plaintext=true&taskid=attempt_200907051329_0016_m_000003_0&filter=stderr
> 09/07/10 20:03:56 INFO mapred.JobClient: Task Id :
> attempt_200907051329_0016_m_000003_1, Status : FAILED
> 09/07/10 20:03:56 WARN mapred.JobClient: Error reading task
>
> outputhttp://hdt1.hyperdomain:50060/tasklog?plaintext=true&taskid=attempt_200907051329_0016_m_000003_1&filter=stdout
> 09/07/10 20:03:56 WARN mapred.JobClient: Error reading task
>
> outputhttp://hdt1.hyperdomain:50060/tasklog?plaintext=true&taskid=attempt_200907051329_0016_m_000003_1&filter=stderr
> 09/07/10 20:03:59 INFO mapred.JobClient: Task Id :
> attempt_200907051329_0016_m_000003_2, Status : FAILED
> 09/07/10 20:03:59 WARN mapred.JobClient: Error reading task
>
> outputhttp://hdt1.hyperdomain:50060/tasklog?plaintext=true&taskid=attempt_200907051329_0016_m_000003_2&filter=stdout
> 09/07/10 20:03:59 WARN mapred.JobClient: Error reading task
>
> outputhttp://hdt1.hyperdomain:50060/tasklog?plaintext=true&taskid=attempt_200907051329_0016_m_000003_2&filter=stderr
> java.io.IOException: Job failed!
>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
>        at org.apache.hadoop.examples.WordCount.run(WordCount.java:163)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at org.apache.hadoop.examples.WordCount.main(WordCount.java:169)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>        at
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:141)
>        at
> org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:61)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
>        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>
> -----------------------------------------------------------------------------------
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message