Return-Path: Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: (qmail 16426 invoked from network); 4 Jun 2010 15:56:58 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 4 Jun 2010 15:56:58 -0000 Received: (qmail 83307 invoked by uid 500); 4 Jun 2010 15:56:58 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 83230 invoked by uid 500); 4 Jun 2010 15:56:57 -0000 Mailing-List: contact hdfs-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-user@hadoop.apache.org Delivered-To: mailing list hdfs-user@hadoop.apache.org Received: (qmail 83222 invoked by uid 99); 4 Jun 2010 15:56:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Jun 2010 15:56:57 +0000 X-ASF-Spam-Status: No, hits=-0.5 required=10.0 tests=AWL,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jeffw@qualtrics.com designates 74.125.83.176 as permitted sender) Received: from [74.125.83.176] (HELO mail-pv0-f176.google.com) (74.125.83.176) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Jun 2010 15:56:51 +0000 Received: by pvg6 with SMTP id 6so944519pvg.35 for ; Fri, 04 Jun 2010 08:56:29 -0700 (PDT) Received: by 10.141.101.17 with SMTP id d17mr9020481rvm.265.1275666989443; Fri, 04 Jun 2010 08:56:29 -0700 (PDT) Received: from [192.168.0.30] ([76.8.195.82]) by mx.google.com with ESMTPS id b12sm2454805rvn.22.2010.06.04.08.56.27 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 04 Jun 2010 08:56:28 -0700 (PDT) From: jeff whiting Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: Lots of Different Kind of Datanode Errors Date: Fri, 4 Jun 2010 09:56:26 -0600 Message-Id: <847C78FC-E08E-44B9-AA45-2EB1B94E430E@qualtrics.com> To: hdfs-user@hadoop.apache.org Mime-Version: 1.0 (Apple Message framework v1078) X-Mailer: Apple Mail (2.1078) I had my HRegionServers go down due to hdfs exception. In the datanode = logs I'm seeing a lot of different and varied exceptions. I've = increased the data xceiver count now but these other ones don't make a = lot of sense. =20 Among them are: :2010-06-04 07:41:56,917 ERROR datanode.DataNode = (DataXceiver.java:run(131)) - DatanodeRegistration(192.168.1.184:50010, = storageID=3DDS-1601700079-192.168.1.184-50010-1274208308658, = infoPort=3D50075, ipcPort=3D50020):DataXceiver -java.io.EOFException - at java.io.DataInputStream.readByte(DataInputStream.java:250) - at = org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298) - at = org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319) - at org.apache.hadoop.io.Text.readString(Text.java:400) - at = org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.= java:313) - at = org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:10= 3) - at java.lang.Thread.run(Thread.java:619) :2010-06-04 08:49:56,389 ERROR datanode.DataNode = (DataXceiver.java:run(131)) - DatanodeRegistration(192.168.1.184:50010, = storageID=3DDS-1601700079-192.168.1.184-50010-1274208308658, = infoPort=3D50075, ipcPort=3D50020):DataXceiver -java.io.IOException: Connection reset by peer - at sun.nio.ch.FileDispatcher.read0(Native Method) - at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) - at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233) - at sun.nio.ch.IOUtil.read(IOUtil.java:206) - at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236) - at = org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream= .java:55) - at = org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:14= 2) - at = org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) :2010-06-04 05:36:54,840 ERROR datanode.DataNode = (DataXceiver.java:run(131)) - DatanodeRegistration(192.168.1.184:50010, = storageID=3DDS-1601700079-192.168.1.184-50010-1274208308658, = infoPort=3D50075, ipcPort=3D50020):DataXceiver -java.io.IOException: xceiverCount 2049 exceeds the limit of concurrent = xcievers 2047 - at = org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88= ) - at java.lang.Thread.run(Thread.java:619) :2010-06-04 05:36:48,848 ERROR datanode.DataNode = (DataXceiver.java:run(131)) - DatanodeRegistration(192.168.1.184:50010, = storageID=3DDS-1601700079-192.168.1.184-50010-1274208308658, = infoPort=3D50075, ipcPort=3D50020):DataXceiver -java.net.SocketTimeoutException: 480000 millis timeout while waiting = for channel to be ready for write. ch : = java.nio.channels.SocketChannel[connected local=3D/192.168.1.184:50010 = remote=3D/192.168.1.184:55349] - at = org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.ja= va:246) - at = org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStrea= m.java:159) - at = org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStrea= m.java:198) - at = org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.= java:313) - at = org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.j= ava:400) - at = org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.j= ava:180) - at = org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95= ) - at java.lang.Thread.run(Thread.java:619) -- The EOFException is the most common one I get. I'm also unsure how I = would get a connection reset by peer when I'm connecting locally. Why = is the file prematurely ending? Any idea of what is going on? Thanks, ~Jeff -- Jeff Whiting Qualtrics Senior Software Engineer jeffw@qualtrics.com