I've tried to follow it the best I can. I already increased the ulimit
to 32768. This is what I now have in my hdfs-site.xml. Am I missing
anything?
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.data.dir</name>
<value>/media/sdb,/media/sdc,/media/sdd</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>4096</value>
</property>
<property>
<name>dfs.datanode.handler.count</name>
<value>10</value>
</property>
</configuration>
.
Todd Lipcon wrote:
> Hi Jeff,
>
> Have you followed the HDFS configuration guide from the HBase wiki?
> You need to bump up the transceiver count and probably ulimit as well.
> Looks like you already tuned to 2048 but isn't high enough if you're
> still getting the "exceeds the limit" message.
>
> The EOFs and Connection Reset messages are when DFS clients are
> disconnecting prematurely from a client stream (probably due to
> xceiver errors on other streams)
>
> -Todd
>
> On Fri, Jun 4, 2010 at 8:56 AM, jeff whiting <jeffw@qualtrics.com
> <mailto:jeffw@qualtrics.com>> wrote:
>
> I had my HRegionServers go down due to hdfs exception. In the
> datanode logs I'm seeing a lot of different and varied exceptions.
> I've increased the data xceiver count now but these other ones
> don't make a lot of sense.
>
> Among them are:
>
> :2010-06-04 07:41:56,917 ERROR datanode.DataNode
> (DataXceiver.java:run(131)) -
> DatanodeRegistration(192.168.1.184:50010
> <http://192.168.1.184:50010>,
> storageID=DS-1601700079-192.168.1.184-50010-1274208308658,
> infoPort=50075, ipcPort=50020):DataXceiver
> -java.io.EOFException
> - at java.io.DataInputStream.readByte(DataInputStream.java:250)
> - at
> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
> - at
> org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
> - at org.apache.hadoop.io.Text.readString(Text.java:400)
> - at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:313)
> - at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
> - at java.lang.Thread.run(Thread.java:619)
>
>
> :2010-06-04 08:49:56,389 ERROR datanode.DataNode
> (DataXceiver.java:run(131)) -
> DatanodeRegistration(192.168.1.184:50010
> <http://192.168.1.184:50010>,
> storageID=DS-1601700079-192.168.1.184-50010-1274208308658,
> infoPort=50075, ipcPort=50020):DataXceiver
> -java.io.IOException: Connection reset by peer
> - at sun.nio.ch.FileDispatcher.read0(Native Method)
> - at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> - at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
> - at sun.nio.ch.IOUtil.read(IOUtil.java:206)
> - at
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
> - at
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
> - at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> - at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>
>
> :2010-06-04 05:36:54,840 ERROR datanode.DataNode
> (DataXceiver.java:run(131)) -
> DatanodeRegistration(192.168.1.184:50010
> <http://192.168.1.184:50010>,
> storageID=DS-1601700079-192.168.1.184-50010-1274208308658,
> infoPort=50075, ipcPort=50020):DataXceiver
> -java.io.IOException: xceiverCount 2049 exceeds the limit of
> concurrent xcievers 2047
> - at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
> - at java.lang.Thread.run(Thread.java:619)
>
> :2010-06-04 05:36:48,848 ERROR datanode.DataNode
> (DataXceiver.java:run(131)) -
> DatanodeRegistration(192.168.1.184:50010
> <http://192.168.1.184:50010>,
> storageID=DS-1601700079-192.168.1.184-50010-1274208308658,
> infoPort=50075, ipcPort=50020):DataXceiver
> -java.net.SocketTimeoutException: 480000 millis timeout while
> waiting for channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected
> local=/192.168.1.184:50010 <http://192.168.1.184:50010>
> remote=/192.168.1.184:55349 <http://192.168.1.184:55349>]
> - at
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
> - at
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
> - at
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
> - at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
> - at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
> - at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
> - at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
> - at java.lang.Thread.run(Thread.java:619)
> --
>
> The EOFException is the most common one I get. I'm also unsure
> how I would get a connection reset by peer when I'm connecting
> locally. Why is the file prematurely ending? Any idea of what is
> going on?
>
> Thanks,
> ~Jeff
>
> --
> Jeff Whiting
> Qualtrics Senior Software Engineer
> jeffw@qualtrics.com <mailto:jeffw@qualtrics.com>
>
>
>
>
>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
--
Jeff Whiting
Qualtrics Senior Software Engineer
jeffw@qualtrics.com
|