Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EE93FCF83 for ; Fri, 27 Apr 2012 22:13:36 +0000 (UTC) Received: (qmail 8922 invoked by uid 500); 27 Apr 2012 22:13:33 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 8870 invoked by uid 500); 27 Apr 2012 22:13:33 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 8862 invoked by uid 99); 27 Apr 2012 22:13:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Apr 2012 22:13:33 +0000 X-ASF-Spam-Status: No, hits=1.1 required=5.0 tests=NO_RDNS_DOTCOM_HELO,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: 98.139.253.105 is neither permitted nor denied by domain of johngeo@yahoo-inc.com) Received: from [98.139.253.105] (HELO mrout2-b.corp.bf1.yahoo.com) (98.139.253.105) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Apr 2012 22:13:27 +0000 Received: from SP1-EX07CAS04.ds.corp.yahoo.com (sp1-ex07cas04.corp.sp1.yahoo.com [216.252.116.155]) by mrout2-b.corp.bf1.yahoo.com (8.14.4/8.14.4/y.out) with ESMTP id q3RMCoZS074431 for ; Fri, 27 Apr 2012 15:12:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=yahoo-inc.com; s=cobra; t=1335564771; bh=fTqe1PLucBmlPrxHIW9c+SHwqKbceXFu3+nbm85wBTo=; h=From:To:Date:Subject:Message-ID:In-Reply-To:Content-Type: Content-Transfer-Encoding:MIME-Version; b=wZc4tXhi4sT2EhLXKe1FxHiG/uPsx9X144gjrZHKGbOUP5zTR10pi/i5zSWGxOLyA sa51Y1V6FTYFeFrrXYQh3Z3QQ9xyReutgc/ZxWp/wl7RSHK7DlTduBZYIani5Bnc+e D1XzL2O1LM26PwHOXfJgqD8zCadBK3yUQODHO9P4= Received: from SP1-EX07VS01.ds.corp.yahoo.com ([216.252.116.136]) by SP1-EX07CAS04.ds.corp.yahoo.com ([216.252.116.158]) with mapi; Fri, 27 Apr 2012 17:12:50 -0500 From: John George To: "common-user@hadoop.apache.org" Date: Fri, 27 Apr 2012 17:12:48 -0500 Subject: Re: DFSClient error Thread-Topic: DFSClient error Thread-Index: Ac0kwuIM1sHvjFTATrqQV5e828TQ0Q== Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.13.0.110805 acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org Can you run a regular 'hadoop fs' (put orls or get) command? If yes, how about a wordcount example? '/hadoop jar hadoop-*examples*.jar wordcount input output' -----Original Message----- From: Mohit Anchlia Reply-To: "common-user@hadoop.apache.org" Date: Fri, 27 Apr 2012 14:36:49 -0700 To: "common-user@hadoop.apache.org" Subject: Re: DFSClient error >I even tried to reduce number of jobs but didn't help. This is what I see: > >datanode logs: > >Initializing secure datanode resources >Successfully obtained privileged resources (streaming port =3D >ServerSocket[addr=3D/0.0.0.0,localport=3D50010] ) (http listener port =3D >sun.nio.ch.ServerSocketChannelImpl[/0.0.0.0:50075]) >Starting regular datanode initialization >26/04/2012 17:06:51 9858 jsvc.exec error: Service exit with a return value >of 143 > >userlogs: > >2012-04-26 19:35:22,801 WARN >org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library is >available >2012-04-26 19:35:22,801 INFO >org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library >loaded >2012-04-26 19:35:22,808 INFO >org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & >initialized native-zlib library >2012-04-26 19:35:22,903 INFO org.apache.hadoop.hdfs.DFSClient: Failed to >connect to /125.18.62.197:50010, add to deadNodes and continue >java.io.EOFException > at java.io.DataInputStream.readShort(DataInputStream.java:298) > at >org.apache.hadoop.hdfs.DFSClient$RemoteBlockReader.newBlockReader(DFSClien >t.java:1664) > at >org.apache.hadoop.hdfs.DFSClient$DFSInputStream.getBlockReader(DFSClient.j >ava:2383) > at >org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java >:2056) > at >org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2170) > at java.io.DataInputStream.read(DataInputStream.java:132) > at >org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(Decompr >essorStream.java:97) > at >org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorSt >ream.java:87) > at >org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.j >ava:75) > at java.io.InputStream.read(InputStream.java:85) > at >org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:205) > at org.apache.hadoop.util.LineReader.readLine(LineReader.java:169) > at >org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRe >cordReader.java:114) > at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:109) > at >org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordRead >er.nextKeyValue(PigRecordReader.java:187) > at >org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapT >ask.java:456) > at >org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) > at org.apache.hadoop.mapred.Child$4.run(Child.java:270) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at >org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation. >java:1157) > at org.apache.hadoop.mapred.Child.main(Child.java:264) >2012-04-26 19:35:22,906 INFO org.apache.hadoop.hdfs.DFSClient: Failed to >connect to /125.18.62.204:50010, add to deadNodes and continue >java.io.EOFException > >namenode logs: > >2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.JobTracker: Job >job_201204261140_0244 added successfully for user 'hadoop' to queue >'default' >2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.JobTracker: >Initializing job_201204261140_0244 >2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.AuditLogger: >USER=3Dhadoop IP=3D125.18.62.196 OPERATION=3DSUBMIT_JOB >TARGET=3Djob_201204261140_0244 RESULT=3DSUCCESS >2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.JobInProgress: >Initializing job_201204261140_0244 >2012-04-26 16:12:53,581 INFO org.apache.hadoop.hdfs.DFSClient: Exception >in >createBlockOutputStream 125.18.62.198:50010 java.io.IOException: Bad >connect ack with firstBadLink as 125.18.62.197:50010 >2012-04-26 16:12:53,581 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning >block blk_2499580289951080275_22499 >2012-04-26 16:12:53,582 INFO org.apache.hadoop.hdfs.DFSClient: Excluding >datanode 125.18.62.197:50010 >2012-04-26 16:12:53,594 INFO org.apache.hadoop.mapred.JobInProgress: >jobToken generated and stored with users keys in >/data/hadoop/mapreduce/job_201204261140_0244/jobToken >2012-04-26 16:12:53,598 INFO org.apache.hadoop.mapred.JobInProgress: Input >size for job job_201204261140_0244 =3D 73808305. Number of splits =3D 1 >2012-04-26 16:12:53,598 INFO org.apache.hadoop.mapred.JobInProgress: >tip:task_201204261140_0244_m_000000 has split on node:/default-rack/ >dsdb4.corp.intuit.net >2012-04-26 16:12:53,598 INFO org.apache.hadoop.mapred.JobInProgress: >tip:task_201204261140_0244_m_000000 has split on node:/default-rack/ >dsdb5.corp.intuit.net >2012-04-26 16:12:53,598 INFO org.apache.hadoop.mapred.JobInProgress: >job_201204261140_0244 LOCALITY_WAIT_FACTOR=3D0.4 >2012-04-26 16:12:53,598 INFO org.apache.hadoop.mapred.JobInProgress: Job >job_201204261140_0244 initialized successfully with 1 map tasks and 0 >reduce tasks. > >On Fri, Apr 27, 2012 at 7:50 AM, Mohit Anchlia >wrote: > >> >> >> On Thu, Apr 26, 2012 at 10:24 PM, Harsh J wrote: >> >>> Is only the same IP printed in all such messages? Can you check the DN >>> log in that machine to see if it reports any form of issues? >>> >>> All IPs were logged with this message >> >> >>> Also, did your jobs fail or kept going despite these hiccups? I notice >>> you're threading your clients though (?), but I can't tell if that may >>> cause this without further information. >>> >>> It started with this error message and slowly all the jobs died with >> "shortRead" errors. >> I am not sure about threading. I am using pig script to read .gz file >> >> >>> On Fri, Apr 27, 2012 at 5:19 AM, Mohit Anchlia >>> wrote: >>> > I had 20 mappers in parallel reading 20 gz files and each file around >>> > 30-40MB data over 5 hadoop nodes and then writing to the analytics >>> > database. Almost midway it started to get this error: >>> > >>> > >>> > 2012-04-26 16:13:53,723 [Thread-8] INFO >>> org.apache.hadoop.hdfs.DFSClient - >>> > Exception in createBlockOutputStream >>> > 17.18.62.192:50010java.io.IOException: Bad connect ack with >>> > firstBadLink as >>> > 17.18.62.191:50010 >>> > >>> > I am trying to look at the logs but doesn't say much. What could be >>>the >>> > reason? We are in pretty closed reliable network and all machines are >>> up. >>> >>> >>> >>> -- >>> Harsh J >>> >> >>