hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: Recurring Hadoop DataNode ERROR in logs
Date Wed, 23 Dec 2009 19:49:24 GMT
Yeah, see HADOOP-3831.   It looks like datanode timing out unused
connections.  As I understand it, later, when dfsclient wants to use this
block, it just sets up the socket again -- silently, transparently below the
level at which the application can see.  Do I have it right?  Is hbase
itself complaining?

St.Ack

On Wed, Dec 23, 2009 at 11:10 AM, Ken Weiner <ken@gumgum.com> wrote:

> We have seen the following HADOOP error occur about 100 times a day spread
> out thoughout the day on each RegionServer/DataNode in our always-on
> HBase/Hadoop cluster.
>
> From *hadoop-gumgum-datanode-xxxxxxxxxxxx.log*
>
> *2009-12-23* *09:58:29*,*717* *ERROR*
> *org.apache.hadoop.hdfs.server.datanode.DataNode:*
> *DatanodeRegistration*(*10.255.9.187:50010*,
> *storageID=DS-1057956046-10.255.9.187-50010-1248395287725*,
> *infoPort=50075*, *ipcPort=50020*)*:DataXceiver*
> *java.net.SocketTimeoutException:* *480000* *millis* *timeout* *while*
> *waiting* *for* *channel* *to* *be* *ready* *for* *write.* *ch* *:*
> *java.nio.channels.SocketChannel*[*connected*
> *local=/10.255.9.187:50010* *remote=/10.255.9.187:46154*]
>        *at*
> *org.apache.hadoop.net.SocketIOWithTimeout.waitForIO*(*SocketIOWithTimeout.java:246*)
>        *at*
> *org.apache.hadoop.net.SocketOutputStream.waitForWritable*(*SocketOutputStream.java:159*)
>        *at*
> *org.apache.hadoop.net.SocketOutputStream.transferToFully*(*SocketOutputStream.java:198*)
>        *at*
> *org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks*(*BlockSender.java:313*)
>        *at*
> *org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock*(*BlockSender.java:400*)
>        *at*
> *org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock*(*DataXceiver.java:180*)
>        *at*
> *org.apache.hadoop.hdfs.server.datanode.DataXceiver.run*(*DataXceiver.java:95*)
>        *at* *java.lang.Thread.run*(*Thread.java:619*)
>
>
> Are other people seeing this error too?  How serious is it?  Can it be
> prevented?
>
> I found a few things that seem related, but I'm not sure how they apply to
> the HBase environment:
> http://issues.apache.org/jira/browse/HDFS-693
> https://issues.apache.org/jira/browse/HADOOP-3831
>
> Info on our environment:
> 1 Node: Master/NameNode/JobTracker (EC2 m1.large)
> 3 Nodes: RegionServer/DataNode/TaskTracker (EC2 m1.large)
>
> Thanks!
>
> -Ken Weiner
>  GumGum & BEDROCK
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message