hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ferdy Galema <ferdy.gal...@kalooga.com>
Subject Re: socket timeouts, dropped packages
Date Wed, 13 Apr 2011 20:06:51 GMT

Thanks for replying. By now we are also pretty sure it's an issue in the 
hardware layer. We have updated the system (kernel/NIC drivers) 
therefore eliminating any possible bugs in there. But still encountering 
timeouts and dropped packets.

My bad, I was not aware that cloudera releases could not be discussed 
here at all. I was thinking that even though cloudera releases are 
somewhat different, issues that are probably generic could still 
discussed here. (Surely I would use the cloudera lists when I'm pretty 
sure it's absolutely specific to cloudera).

Anyway, I will update the list when we have figured the problem out. The 
right list, cdh-user ;)


On 04/13/2011 09:22 PM, Eli Collins wrote:
> Hey Ferdy,
> If you're seeing this after bumping fs.datanode.max.xcievers and the
> nfiles ulimit, and you're also seeing dropped packets it sounds like
> you're having networking issues.
> See the following as well:
> https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/1d3a377bd605e1bd/d3d8ec0d14c065bb?#d3d8ec0d14c065bb
> Thanks,
> Eli
> On Tue, Apr 12, 2011 at 10:37 AM, Ferdy Galema<ferdy.galema@kalooga.com>  wrote:
>> Hi,
>> We're running into issues were we are seeing timeouts when writing/reading a
>> lot of hdfs data. (Hadoop is version CDH4B3 and hdfs appending is enabled).
>> The type of exceptions vary a lot, but most of the times it's whenever a
>> DFSClient writes data into the datanodes pipeline.
>> For example, one datanode logs "Exception in receiveBlock for block
>> blk_5476601577216704980_62953994 java.io.EOFException: while trying to read
>> 65557 bytes" and the other side logs "writeBlock
>> blk_5476601577216704980_62953994 received exception
>> java.net.SocketTimeoutException: Read timed out". That's it.
>> We cannot seem to determine the exact problem. The read timeout is default
>> (60 sec). The open files limit and the number of xceivers is upped a lot. A
>> full GC never takes longer than a second.
>> However, we are seeing a lot of dropped packages on the networking
>> interface. Could these problems be related?
>> Any advice will be helpful.
>> Ferdy.

View raw message