hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stephen mulcahy <stephen.mulc...@deri.org>
Subject Re: Network problems Hadoop 0.20.2 and Terasort on Debian 2.6.32 kernel
Date Fri, 09 Apr 2010 15:18:02 GMT
Allen Wittenauer wrote:
> On Apr 8, 2010, at 9:37 AM, stephen mulcahy wrote:
>> When I run this on the Debian 2.6.32 kernel - over the course of the run, 1 or 2
datanodes of the cluster enter a state whereby they are no longer responsive to network traffic.
> 
> How much free memory do you have?

Lots, a few GB

> 
> How many tasks per node do you have?

I left this at the default.

> 
> What are the service times, etc, on your IO system?  

Can you clarify this query?

> 
>> Has anyone run into similar problems with their environments? I noticed that the
when the nodes become unresponsive, it often happens when the TeraSort is at
> 
> I've always seen Linux nodes go unresponsive when they get memory starved to the point
that the OOM can't function because it can't allocate enough mem.

Sure, but I can login to the unresponsive nodes via the console - it's 
just the network that has become responsive. To be clear here, I don't 
suspect Hadoop is the root cause of the problem - I suspect either a 
kernel bug or some other operating system level bug. I was wondering if 
others had run into similar problems.

I was also wondering in general what kernel versions and distros people 
are using, especially for larger production clusters.

Thanks,

-stephen

-- 
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie    http://webstar.deri.ie    http://sindice.com

Mime
View raw message