hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Howell <dehow...@gmail.com>
Subject Re: losing network interfaces during long running map-reduce jobs
Date Sat, 03 Apr 2010 23:03:00 GMT
> Could you clarify wha you mean by "losing their networking"? Can you ping
> the node externally? If you access the node via the console (via ILOM, etc)
> and run tcpdump or tshark, can you see ethernet broadcast traffic at all? Do
> you see anything in dmesg on the machine in question?
> Thanks
> -Todd

My cluster is small and the physical servers managed by my company's
IT department... I just admin the Hadoop install and I don't have
access except through ssh. When one of my nodes goes unresponsive, it
doesn't respond to ping, ssh, or any traffic on any port. I've been
limited so far to trying to investigate logs after my sysadmin
restarts the networking interface.

But I haven't seen anything in the dmesg log. I'll have to try looking
at the tcpdump output on Monday, once I can get console access again.
My apologies that I'm so sketchy on details right now... so far, I
haven't been any able to find any evidence of something going wrong
except for the hadoop log entries when the IOExceptions start.


View raw message