hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Angeles <patr...@cloudera.com>
Subject Re: losing network interfaces during long running map-reduce jobs
Date Sat, 03 Apr 2010 03:56:27 GMT
Hi David,

Strange indeed. I assume nothing in your configs changed. Anything funny in
the logs? You should also rule out the switch itself as being faulty.

It's possible that CDH2 has a patch that's not in 0.20.1 that's causing this
problem, but we haven't heard this exact problem from any of our other
customers / users.

- P

On Fri, Apr 2, 2010 at 6:16 PM, David Howell <dehowell@gmail.com> wrote:

> I'm encountering a completely bizarre failure mode in my Hadoop
> cluster. A week ago, I switched from vanilla apache Hadoop 0.20.1 to
> CDH 2.
>
> Ever since then, my tasktracker/ datenode machines have been regularly
> losing their networking during long (> 1 hour) jobs. Restarting the
> network interface brings them back online immediately.
>
> I'm mystified as to how this can be happening: anyone care to venture
> a hypothesis? I'm running on Centos 5.2.
>
> Cheers,
> David
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message