hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: losing network interfaces during long running map-reduce jobs
Date Sat, 03 Apr 2010 20:22:27 GMT
Hi David,

On Fri, Apr 2, 2010 at 6:16 PM, David Howell <dehowell@gmail.com> wrote:

> I'm encountering a completely bizarre failure mode in my Hadoop
> cluster. A week ago, I switched from vanilla apache Hadoop 0.20.1 to
> CDH 2.
> Ever since then, my tasktracker/ datenode machines have been regularly
> losing their networking during long (> 1 hour) jobs. Restarting the
> network interface brings them back online immediately.
Could you clarify wha you mean by "losing their networking"? Can you ping
the node externally? If you access the node via the console (via ILOM, etc)
and run tcpdump or tshark, can you see ethernet broadcast traffic at all? Do
you see anything in dmesg on the machine in question?


Todd Lipcon
Software Engineer, Cloudera

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message