hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Bockelman <bbock...@cse.unl.edu>
Subject Re: time outs when accessing port 50010
Date Mon, 14 Dec 2009 15:29:08 GMT

On Dec 14, 2009, at 5:21 AM, javateck javateck wrote:

> I have exactly the same issue there.
> 
> Sometimes really feel helpless, maybe very few people use hadoop as a FS at
> all. I think this is also why people stop using it, there are so many issues
> there, and so few people can help or have the experience.

<soapbox>
These are the joys of working on a young software project, right?  I would point out that
many folks answer many questions every day on the mailing lists.  If you want every question
solved every time, you have the option of buying (excellent) support.

As far as distributed file systems go, I've got a lot of experience running ones that have
more issues and are used by even less folks.  It's not pleasant.  If you just need a 30-40TB
filesystem (i.e., not a data processing system) I'd agree that you can probably find more
mature systems.  If you use HDFS as a file system only and don't see clear benefits over Lustre,
then perhaps you should be using Lustre in the first place.
</soapbox>

With regards to the error below, I'd guess that it is caused by a networking partition - i.e.,
it appears that the client couldn't open a socket connection to 10.1.75.125 from 10.1.75.11.
 I'd check for routing issues on both nodes.  Does the error happen intermittently for any
two nodes or if you look through the past incidents, does it always involve the same node?

Brian

> 
> 
> On Wed, Nov 25, 2009 at 11:27 AM, David J. O'Dell <dodell@videoegg.com>wrote:
> 
>> I have 2 clusters:
>> 30 nodes running 0.18.3
>> and
>> 36 nodes running 0.20.1
>> 
>> I've intermittently seen the following errors on both of my clusters, it
>> happens when writing files.
>> I was hoping this would go away with the new version but I see the same
>> behavior on both versions.
>> The namenode logs don't show any problems, its always on the client and
>> datanodes.
>> 
>> Below is any example from this morning, unfortunately I haven't found a bug
>> or config that specifically addresses this issue.
>> 
>> Any insight would be greatly appreciated.
>> 
>> Client log:
>> 09/11/25 10:54:15 INFO hdfs.DFSClient: Exception in createBlockOutputStream
>> java.net.SocketTimeoutException: 69000 millis timeout while waiting for
>> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected
>> local=/10.1.75.11:37852 remote=/10.1.75.125:50010]
>> 09/11/25 10:54:15 INFO hdfs.DFSClient: Abandoning block
>> blk_-105422935413230449_22608
>> 09/11/25 10:54:15 INFO hdfs.DFSClient: Waiting to find target node:
>> 10.1.75.125:50010
>> 
>> Datanode log:
>> 2009-11-25 10:54:51,170 ERROR
>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>> 10.1.75.125:50010,
>> storageID=DS-1401408597-10.1.75.125-50010-1258737830230, infoPort=50075,
>> ipcPort=50020):DataXceiver
>> java.net.SocketTimeoutException: 120000 millis timeout while waiting for
>> channel to be ready for connect. ch :
>> java.nio.channels.SocketChannel[connection-pending remote=/
>> 10.1.75.104:50010]
>>      at
>> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
>>      at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
>>      at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:282)
>>      at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
>>      at java.lang.Thread.run(Thread.java:619)
>> 
>> 


Mime
View raw message