hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stephen mulcahy <stephen.mulc...@deri.org>
Subject Re: Network problems Hadoop 0.20.2 and Terasort on Debian 2.6.32 kernel
Date Fri, 09 Apr 2010 15:18:02 GMT
Allen Wittenauer wrote:
> On Apr 8, 2010, at 9:37 AM, stephen mulcahy wrote:
>> When I run this on the Debian 2.6.32 kernel - over the course of the run, 1 or 2
datanodes of the cluster enter a state whereby they are no longer responsive to network traffic.
> How much free memory do you have?

Lots, a few GB

> How many tasks per node do you have?

I left this at the default.

> What are the service times, etc, on your IO system?  

Can you clarify this query?

>> Has anyone run into similar problems with their environments? I noticed that the
when the nodes become unresponsive, it often happens when the TeraSort is at
> I've always seen Linux nodes go unresponsive when they get memory starved to the point
that the OOM can't function because it can't allocate enough mem.

Sure, but I can login to the unresponsive nodes via the console - it's 
just the network that has become responsive. To be clear here, I don't 
suspect Hadoop is the root cause of the problem - I suspect either a 
kernel bug or some other operating system level bug. I was wondering if 
others had run into similar problems.

I was also wondering in general what kernel versions and distros people 
are using, especially for larger production clusters.



Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie    http://webstar.deri.ie    http://sindice.com

View raw message