hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Whiting <je...@qualtrics.com>
Subject Re: Lots of Different Kind of Datanode Errors
Date Fri, 04 Jun 2010 16:37:22 GMT
I've tried to follow it the best I can.  I already increased the ulimit 
to 32768.  This is what I now have in my hdfs-site.xml.  Am I missing 
anything?
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>dfs.data.dir</name>
  <value>/media/sdb,/media/sdc,/media/sdd</value>
</property>

  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.datanode.max.xcievers</name>
    <value>4096</value>
  </property>
  <property>
    <name>dfs.datanode.handler.count</name>
    <value>10</value>
  </property>
</configuration>


.
Todd Lipcon wrote:
> Hi Jeff,
>
> Have you followed the HDFS configuration guide from the HBase wiki? 
> You need to bump up the transceiver count and probably ulimit as well. 
> Looks like you already tuned to 2048 but isn't high enough if you're 
> still getting the "exceeds the limit" message.
>
> The EOFs and Connection Reset messages are when DFS clients are 
> disconnecting prematurely from a client stream (probably due to 
> xceiver errors on other streams)
>
> -Todd
>
> On Fri, Jun 4, 2010 at 8:56 AM, jeff whiting <jeffw@qualtrics.com 
> <mailto:jeffw@qualtrics.com>> wrote:
>
>     I had my HRegionServers go down due to hdfs exception.  In the
>     datanode logs I'm seeing a lot of different and varied exceptions.
>      I've increased the data xceiver count now but these other ones
>     don't make a lot of sense.
>
>     Among them are:
>
>     :2010-06-04 07:41:56,917 ERROR datanode.DataNode
>     (DataXceiver.java:run(131)) -
>     DatanodeRegistration(192.168.1.184:50010
>     <http://192.168.1.184:50010>,
>     storageID=DS-1601700079-192.168.1.184-50010-1274208308658,
>     infoPort=50075, ipcPort=50020):DataXceiver
>     -java.io.EOFException
>     -       at java.io.DataInputStream.readByte(DataInputStream.java:250)
>     -       at
>     org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>     -       at
>     org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>     -       at org.apache.hadoop.io.Text.readString(Text.java:400)
>     -       at
>     org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:313)
>     -       at
>     org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
>     -       at java.lang.Thread.run(Thread.java:619)
>
>
>     :2010-06-04 08:49:56,389 ERROR datanode.DataNode
>     (DataXceiver.java:run(131)) -
>     DatanodeRegistration(192.168.1.184:50010
>     <http://192.168.1.184:50010>,
>     storageID=DS-1601700079-192.168.1.184-50010-1274208308658,
>     infoPort=50075, ipcPort=50020):DataXceiver
>     -java.io.IOException: Connection reset by peer
>     -       at sun.nio.ch.FileDispatcher.read0(Native Method)
>     -       at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>     -       at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
>     -       at sun.nio.ch.IOUtil.read(IOUtil.java:206)
>     -       at
>     sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
>     -       at
>     org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>     -       at
>     org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>     -       at
>     org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>
>
>     :2010-06-04 05:36:54,840 ERROR datanode.DataNode
>     (DataXceiver.java:run(131)) -
>     DatanodeRegistration(192.168.1.184:50010
>     <http://192.168.1.184:50010>,
>     storageID=DS-1601700079-192.168.1.184-50010-1274208308658,
>     infoPort=50075, ipcPort=50020):DataXceiver
>     -java.io.IOException: xceiverCount 2049 exceeds the limit of
>     concurrent xcievers 2047
>     -       at
>     org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
>     -       at java.lang.Thread.run(Thread.java:619)
>
>     :2010-06-04 05:36:48,848 ERROR datanode.DataNode
>     (DataXceiver.java:run(131)) -
>     DatanodeRegistration(192.168.1.184:50010
>     <http://192.168.1.184:50010>,
>     storageID=DS-1601700079-192.168.1.184-50010-1274208308658,
>     infoPort=50075, ipcPort=50020):DataXceiver
>     -java.net.SocketTimeoutException: 480000 millis timeout while
>     waiting for channel to be ready for write. ch :
>     java.nio.channels.SocketChannel[connected
>     local=/192.168.1.184:50010 <http://192.168.1.184:50010>
>     remote=/192.168.1.184:55349 <http://192.168.1.184:55349>]
>     -       at
>     org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
>     -       at
>     org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
>     -       at
>     org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
>     -       at
>     org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
>     -       at
>     org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
>     -       at
>     org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
>     -       at
>     org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>     -       at java.lang.Thread.run(Thread.java:619)
>     --
>
>     The EOFException is the most common one I get.  I'm also unsure
>     how I would get a connection reset by peer when I'm connecting
>     locally.  Why is the file prematurely ending? Any idea of what is
>     going on?
>
>     Thanks,
>     ~Jeff
>
>     --
>     Jeff Whiting
>     Qualtrics Senior Software Engineer
>     jeffw@qualtrics.com <mailto:jeffw@qualtrics.com>
>
>
>
>
>
>
>
>
> -- 
> Todd Lipcon
> Software Engineer, Cloudera

-- 
Jeff Whiting
Qualtrics Senior Software Engineer
jeffw@qualtrics.com


Mime
View raw message