hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Kimball <aa...@cloudera.com>
Subject Re: Error reading task output
Date Wed, 08 Jul 2009 21:41:50 GMT
Hmmm... The way Linux resolves a DNS entry is to first read through the file
in /etc/hosts and try to match a name in there. If this fails, then it
contacts an external DNS server for the lookup.

The /etc/hosts file contains lines of the form:
IP_ADDR   NAME [NAME NAME NAME....]

so you might see:
127.0.0.1    wombat localhost localhost.localdomain

... assuming your computer's name is "wombat."

When a server registers with the NameNode/JobTracker as ready for work, it
provides its DNS name to the NN/JT. It does this "reverse lookup" by
figuring out its own IP address (which likely will report 127.0.0.1) and
then picking the first name on the line.

So if you've got the line:

127.0.0.1 localhost wombat

in your /etc/hosts file, change that around to:

127.0.0.1 wombat localhost


That having been said, if this problem only happens for you after jobs have
been running a while, it's likely that DNS isn't your issue. What exact
error messages are showing up in your log? What hadoop version are you
running?

- Aaron



On Mon, Jul 6, 2009 at 2:06 AM, Ian jonhson <jonhson.ian@gmail.com> wrote:

> Hi Aaron,
>
> > This isn't Hadoop-specific, it's how Linux treats its network
> configuration.
> > If you look at /etc/host.conf, you'll probably see a line that says
> "order>
> >hosts, bind" -- this is telling Linux's DNS resolution library to first
> read
> >your /etc/hosts file, then check an external DNS server.
>
> >You could probably disable local hostfile checking, but that means that
> >every time a program on your system queries the authoritative hostname for
> >"localhost", it'll go out to the network. You'll probably see a big
> >performance hit. The better solution, I think, is to get your nodes'
> >/etc/hosts files squared away. You only need to do so once :)
>
>
> I don't get the meaning of your better solution. Could you tell me how to
> "squared away" the  /etc/hosts files.
>
> I also meet the same problem in my hadoop. it is very strange that the
> problem
> is not occurred after restarting hadoop but it will take place again
> after several
> long-time jobs.
>
> Thanks.
>
> Ian
>
>
> On Thu, Apr 16, 2009 at 11:31 AM, Cam Macdonell <cam@cs.ualberta.ca>
> wrote:
>
> > Cam Macdonell wrote:
> >
> >>
> >> Hi,
> >>
> >> I'm getting the following warning when running the simple wordcount and
> >> grep examples.
> >>
> >> 09/04/15 16:54:16 INFO mapred.JobClient: Task Id :
> >> attempt_200904151649_0001_m_000019_0, Status : FAILED
> >> Too many fetch-failures
> >> 09/04/15 16:54:16 WARN mapred.JobClient: Error reading task
> >>
> outputhttp://localhost.localdomain:50060/tasklog?plaintext=true&taskid=attempt_200904151649_0001_m_000019_0&filter=stdout
> >>
> >> 09/04/15 16:54:16 WARN mapred.JobClient: Error reading task
> >>
> outputhttp://localhost.localdomain:50060/tasklog?plaintext=true&taskid=attempt_200904151649_0001_m_000019_0&filter=stderr
> >>
> >>
> >> The only advice I could find from other posts with similar errors is to
> >> setup /etc/hosts with all slaves and the host IPs.  I did this, but I
> still
> >> get the warning above.  The output seems to come out alright however (I
> >> guess that's why it is a warning).
> >>
> >> I tried running a wget on the http:// address in the warning message
> and
> >> I get the following back
> >>
> >> 2009-04-15 16:53:46 ERROR 400: Argument taskid is required.
> >>
> >> So perhaps the wrong task ID is being passed to the http request.  Any
> >> ideas on what can get rid of these warnings?
> >>
> >> Thanks,
> >> Cam
> >>
> >
> > Well, for future googlers, I'll answer my own post.  Watch our for the
> > hostname at the end of "localhost" lines on slaves.  One of my slaves was
> > registering itself as "localhost.localdomain" with the jobtracker.
> >
> > Is there a way that Hadoop could be made to not be so dependent on
> > /etc/hosts, but on more dynamic hostname resolution?
> >
> > Cam
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message