Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D701F993E for ; Thu, 26 Jan 2012 11:59:50 +0000 (UTC) Received: (qmail 2875 invoked by uid 500); 26 Jan 2012 11:59:47 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 2777 invoked by uid 500); 26 Jan 2012 11:59:46 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 2764 invoked by uid 99); 26 Jan 2012 11:59:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Jan 2012 11:59:45 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of michael_segel@hotmail.com designates 65.55.111.161 as permitted sender) Received: from [65.55.111.161] (HELO blu0-omc4-s22.blu0.hotmail.com) (65.55.111.161) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Jan 2012 11:59:36 +0000 Received: from BLU0-SMTP179 ([65.55.111.137]) by blu0-omc4-s22.blu0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Thu, 26 Jan 2012 03:59:12 -0800 X-Originating-IP: [166.137.138.28] X-Originating-Email: [michael_segel@hotmail.com] Message-ID: Received: from [172.20.10.2] ([166.137.138.28]) by BLU0-SMTP179.phx.gbl over TLS secured channel with Microsoft SMTPSVC(6.0.3790.4675); Thu, 26 Jan 2012 03:59:10 -0800 Subject: Re: Too many open files Error References: From: Michel Segel Content-Type: text/plain; charset="us-ascii" X-Mailer: iPad Mail (9A405) In-Reply-To: Date: Thu, 26 Jan 2012 05:58:42 -0600 To: "common-user@hadoop.apache.org" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 (1.0) X-OriginalArrivalTime: 26 Jan 2012 11:59:10.0470 (UTC) FILETIME=[E9FE7260:01CCDC21] Sorry going from memory... As user Hadoop or mapred or hdfs what do you see when you do a ulimit -a? That should give you the number of open files allowed by a single user... Sent from a remote device. Please excuse any typos... Mike Segel On Jan 26, 2012, at 5:13 AM, Mark question wrote: > Hi guys, >=20 > I get this error from a job trying to process 3Million records. >=20 > java.io.IOException: Bad connect ack with firstBadLink 192.168.1.20:50010 > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(D= FSClient.java:2903) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFS= Client.java:2826) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.jav= a:2102) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClien= t.java:2288) >=20 > When I checked the logfile of the datanode-20, I see : >=20 > 2012-01-26 03:00:11,827 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( > 192.168.1.20:50010, storageID=3DDS-97608578-192.168.1.20-50010-13275752053= 69, > infoPort=3D50075, ipcPort=3D50020):DataXceiver > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcher.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202) > at sun.nio.ch.IOUtil.read(IOUtil.java:175) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) > at > org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream= .java:55) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:14= 2) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) > at java.io.BufferedInputStream.read(BufferedInputStream.java:317) > at java.io.DataInputStream.read(DataInputStream.java:132) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiv= er.java:262) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockR= eceiver.java:309) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockRe= ceiver.java:373) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockRec= eiver.java:525) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.= java:357) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:10= 3) > at java.lang.Thread.run(Thread.java:662) >=20 >=20 > Which is because I'm running 10 maps per taskTracker on a 20 node cluster,= > each map opens about 300 files so that should give 6000 opened files at th= e > same time ... why is this a problem? the maximum # of files per process on= > one machine is: >=20 > cat /proc/sys/fs/file-max ---> 2403545 >=20 >=20 > Any suggestions? >=20 > Thanks, > Mark