Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 920DA958B for ; Thu, 23 Feb 2012 22:07:44 +0000 (UTC) Received: (qmail 77818 invoked by uid 500); 23 Feb 2012 22:07:42 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 77768 invoked by uid 500); 23 Feb 2012 22:07:42 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 77759 invoked by uid 99); 23 Feb 2012 22:07:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Feb 2012 22:07:42 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [98.139.91.85] (HELO nm15.bullet.mail.sp2.yahoo.com) (98.139.91.85) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 23 Feb 2012 22:07:33 +0000 Received: from [98.139.91.70] by nm15.bullet.mail.sp2.yahoo.com with NNFMP; 23 Feb 2012 22:07:12 -0000 Received: from [72.30.22.33] by tm10.bullet.mail.sp2.yahoo.com with NNFMP; 23 Feb 2012 22:06:12 -0000 Received: from [127.0.0.1] by omp1061.mail.sp2.yahoo.com with NNFMP; 23 Feb 2012 22:06:12 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 806326.6263.bm@omp1061.mail.sp2.yahoo.com Received: (qmail 9494 invoked by uid 60001); 23 Feb 2012 22:06:12 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1330034772; bh=xcn2gqSW421xw67HUkFDbHqmn2kR3xfwQTvdOD8uego=; h=X-YMail-OSG:Received:X-RocketYMMF:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=3pgSMu5s0aSZgB9KR/R81sRfZVoAadAlyV1x609mBXfnlGKMLvF4SoIsJnbRIfzqWKnUCSiwdUAxs8U9OOGwYOwoyiNncO4zEVUms4R3rY0o7p8hh64Bjkqd3/W2jEHCpsP4ZIhLFM4AgczwXdRdglvsO/cNqCLJrQ0TeqY0/vw= X-YMail-OSG: lboRE.YVM1mvXLIDcUZhXOtIGcam2lasPq8goThUZe1GvQN HubhJjOwxbLdgf.SSddi.wh1vqzORc0FMPZXMLYi5gIm2wumUD9u4YnULLT0 Wzk9RpelsmVlncZYu1dAMaeUAvaTsBs8aUzfV8cFcSHVwvz0We6RvUSUKkxw YoOgH5YcGvObuwhemJiHoTX28SwKDitp0BuSzTr6QC9xzz5QcjTRsnM9VAEB Hh8..dcrBCAz5n_nRCmwgifiGDkZmkoa8w3VN8qM3eoRRudmBWac7.f6QjUm 28oClgG0aqwXdOdyvLyfOBFON0w8Z3Ug0KE1YEbKnLsuuC.I9b5w86GgdEVg IY2TKx8MRnWqXvl_khZ0YXAWGBnpv_sfDyU_A0d4Hx33kiSvnRTNqv7gf0dN q.owck3WahBsjGX4EQPn5FjKLg6VDHF86qu6XNAi2nlG52uuiqeAe8T_qokL 1ZGG_g1Dzn6JyV6yZL61rd1sS4EUk.RlbCyiHLFH51tfg3U5j0mmneBWbV5j 7_xRN0ixnn.8wVRgLM2PB1JaZvrFW7fasijqflQpVGVRqWNJu8.d8Jq8TktN 9U.CRo1OYXj2nrPl8eLp69q52dG1o6zx0INKbqy4ZlNRjJrpnzv5qK7BGi2c CgkbtwnKgG7xs.wHjzaGIGl_2QIfNLisEhvnWyTUimZD9w8DH Received: from [69.231.24.241] by web164501.mail.gq1.yahoo.com via HTTP; Thu, 23 Feb 2012 14:06:11 PST X-RocketYMMF: apurtell X-Mailer: YahooMailWebService/0.8.116.338427 References: Message-ID: <1330034771.99794.YahooMailNeo@web164501.mail.gq1.yahoo.com> Date: Thu, 23 Feb 2012 14:06:11 -0800 (PST) From: Andrew Purtell Reply-To: Andrew Purtell Subject: Re: Exception in hbase 0.92. with DFS, - Bad connect ack To: "user@hbase.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Check your switch/link/uplink utilization.=0A=A0=0AHDFS-941 might help. Tha= t is not in Hadoop 1.0 according to a cursory search over branch history in= the Git mirror.=0A=0A=0AAs another datapoint, we see this in our productio= n with a Hadoop that is much closer to CDH3; but, we have some known issues= with the network design in our legacy datacenters and plan to resolve it w= ith an eventual relocation. I'm also integrating HDFS-941.=0A=0A=0ABest reg= ards,=0A=0A=0A=A0 =A0 - Andy=0A=0AProblems worthy of attack prove their wor= th by hitting back. - Piet Hein (via Tom White)=0A=0A=0A=0A----- Original M= essage -----=0A> From: Mikael Sitruk =0A> To: user= @hbase.apache.org=0A> Cc: =0A> Sent: Thursday, February 23, 2012 1:25 PM=0A= > Subject: Exception in hbase 0.92. with DFS, - Bad connect ack=0A> =0A> Hi= =0A> =0A> I see that i have in my hbase logs a lot of the following (target= IP is=0A> changing)=0A> 2012-02-23 23:04:02,699 INFO org.apache.hadoop.hdf= s.DFSClient: Exception in=0A> createBlockOutputStream 10.232.83.87:50010 ja= va.io.IOException: Bad connect=0A> ack with firstBadLink as 10.232.83.118:5= 0010=0A> 2012-02-23 23:04:02,699 INFO org.apache.hadoop.hdfs.DFSClient: Aba= ndoning=0A> block blk_4678388308309640326_170570=0A> 2012-02-23 23:04:02,70= 1 INFO org.apache.hadoop.hdfs.DFSClient: Excluding=0A> datanode 10.232.83.1= 18:50010=0A> =0A> Then checking the hdfs log of the same server (87)=0A> 20= 12-02-23 23:04:02,698 INFO=0A> org.apache.hadoop.hdfs.server.datanode.DataN= ode: writeBlock=0A> blk_4678388308309640326_170570 received exception=0A> j= ava.net.SocketTimeoutException: 66000 millis timeout while waiting for=0A> = channel to be ready for connect. ch :=0A> java.nio.channels.SocketChannel[c= onnection-pending remote=3D/=0A> 10.232.83.118:50010]=0A> 2012-02-23 23:04:= 02,699 ERROR=0A> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeR= egistration(=0A> 10.232.83.87:50010,=0A> storageID=3DDS-1257662823-10.232.8= 3.87-50010-1329398253085, infoPort=3D50075,=0A> ipcPort=3D50020):DataXceive= r=0A> java.net.SocketTimeoutException: 66000 millis timeout while waiting f= or=0A> channel to be ready for connect. ch :=0A> java.nio.channels.SocketCh= annel[connection-pending remote=3D/=0A> 10.232.83.118:50010]=0A> =A0 =A0 = =A0 =A0 at=0A> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWi= thTimeout.java:213)=0A> =A0 =A0 =A0 =A0 at org.apache.hadoop.net.NetUtils.c= onnect(NetUtils.java:656)=0A> =A0 =A0 =A0 =A0 at=0A> org.apache.hadoop.hdfs= .server.datanode.DataXceiver.writeBlock(DataXceiver.java:319)=0A> =A0 =A0 = =A0 =A0 at=0A> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataX= ceiver.java:107)=0A> =A0 =A0 =A0 =A0 at java.lang.Thread.run(Thread.java:66= 2)=0A> =0A> =0A> Looking at the target (118) server hdfs log does not seems= to show any=0A> problem around the same time.=0A> 2012-02-23 23:04:01,648 = INFO=0A> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: = /=0A> 10.232.83.118:45623, dest: /10.232.83.118:50010, bytes: 67108864, op:= =0A> HDFS_WRITE, cliID: DFSClient_hb_rs_shaked118,60020,1329985953141, offs= et:=0A> 0, srvID: DS-1348867834-10.232.83.118-50010-1329398246569, blockid:= =0A> blk_-1747243057136009792_170577, duration: 6932047000=0A> 2012-02-23 2= 3:04:01,649 INFO=0A> org.apache.hadoop.hdfs.server.datanode.DataNode: Packe= tResponder 2 for=0A> block blk_-1747243057136009792_170577 terminating=0A> = 2012-02-23 23:04:01,656 INFO=0A> org.apache.hadoop.hdfs.server.datanode.Dat= aNode: Receiving block=0A> blk_-4467275870825484381_170577 src: /10.232.83.= 118:45626 dest: /=0A> 10.232.83.118:50010=0A> 2012-02-23 23:04:03,467 INFO= =0A> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block=0A> b= lk_6330134749736235430_170577 src: /10.232.83.114:49175 dest: /=0A> 10.232.= 83.118:50010=0A> 2012-02-23 23:04:05,153 INFO=0A> org.apache.hadoop.hdfs.se= rver.datanode.DataNode.clienttrace: src: /=0A> 10.232.83.118:50010, dest: /= 10.232.83.118:45615, bytes: 67633152, op:=0A> HDFS_READ, cliID: DFSClient_h= b_rs_shaked118,60020,1329985953141, offset: 0,=0A> srvID: DS-1348867834-10.= 232.83.118-50010-1329398246569, blockid:=0A> blk_-7285361301892533992_16555= 5, duration: 27134342000=0A> 2012-02-23 23:04:08,569 INFO=0A> org.apache.ha= doop.hdfs.server.datanode.DataNode.clienttrace: src: /=0A> 10.232.83.118:45= 626, dest: /10.232.83.118:50010, bytes: 67108864, op:=0A> HDFS_WRITE, cliID= : DFSClient_hb_rs_shaked118,60020,1329985953141, offset:=0A> 0, srvID: DS-1= 348867834-10.232.83.118-50010-1329398246569, blockid:=0A> blk_-446727587082= 5484381_170577, duration: 6906584000=0A> 2012-02-23 23:04:08,570 INFO=0A> o= rg.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for=0A> b= lock blk_-4467275870825484381_170577 terminating=0A> 2012-02-23 23:04:08,57= 2 INFO=0A> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block= =0A> blk_6927577191995683160_170577 src: /10.232.83.118:45629 dest: /=0A> 1= 0.232.83.118:50010=0A> 2012-02-23 23:04:09,283 INFO=0A> org.apache.hadoop.h= dfs.server.datanode.DataNode: Receiving block=0A> blk_7440488846881064366_1= 70577 src: /10.232.83.86:60436 dest: /=0A> 10.232.83.118:50010=0A> =0A> I h= ave checked gc logs, but no pauses where noted (all full gc pauses=0A> <10m= s).=0A> =0A> Any idea of what the problem can be?=0A> =0A> I use HB: 0.92.0= and HDFS 1.0.0=0A> Thanks=0A> Mikael.S=0A>