Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 74713 invoked from network); 26 Sep 2008 15:41:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 26 Sep 2008 15:41:59 -0000 Received: (qmail 97121 invoked by uid 500); 26 Sep 2008 15:41:54 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 96802 invoked by uid 500); 26 Sep 2008 15:41:52 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 96780 invoked by uid 99); 26 Sep 2008 15:41:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Sep 2008 08:41:52 -0700 X-ASF-Spam-Status: No, hits=2.7 required=10.0 tests=SPF_NEUTRAL,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [216.145.54.173] (HELO mrout3.yahoo.com) (216.145.54.173) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Sep 2008 15:40:47 +0000 Received: from [216.145.54.158] (socks1.corp.yahoo.com [216.145.54.158]) by mrout3.yahoo.com (8.13.6/8.13.6/y.out) with ESMTP id m8QFewSu053551; Fri, 26 Sep 2008 08:40:58 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=message-id:date:from:user-agent:mime-version:to:cc:subject: references:in-reply-to:content-type:content-transfer-encoding; b=mA/Q1MLuS1VyiNMsJX2wkWrpzpp+XDbGJsPD9QBt5g/Ilwq2b26Pg9QlnPhNUojh Message-ID: <48DD0287.1040303@yahoo-inc.com> Date: Fri, 26 Sep 2008 08:40:55 -0700 From: Raghu Angadi User-Agent: Thunderbird 2.0.0.16 (Windows/20080708) MIME-Version: 1.0 To: core-dev@hadoop.apache.org CC: core-user@hadoop.apache.org Subject: Re: IPC Client error | Too many files open References: <8F11722A0562BB4F80A680ED4CFED0D5057E7F60@EVSBNG02.ad.office.aol.com> In-Reply-To: <8F11722A0562BB4F80A680ED4CFED0D5057E7F60@EVSBNG02.ad.office.aol.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org What does jstack show for this? Probably better suited for jira discussion. Raghu. Goel, Ankur wrote: > Hi Folks, > > We have developed a simple log writer in Java that is plugged into > Apache custom log and writes log entries directly to our hadoop cluster > (50 machines, quad core, each with 16 GB Ram and 800 GB hard-disk, 1 > machine as dedicated Namenode another machine as JobTracker & > TaskTracker + DataNode). > > There are around 8 Apache servers dumping logs into HDFS via our writer. > Everything was working fine and we were getting around 15 - 20 MB log > data per hour from each server. > > > > Recently we have been experiencing problems with 2-3 of our Apache > servers where a file is opened by log-writer in HDFS for writing but it > never receives any data. > > Looking at apache error logs shows the following errors > > 08/09/22 05:02:13 INFO ipc.Client: java.io.IOException: Too many open > files > at sun.nio.ch.IOUtil.initPipe(Native Method) > at > sun.nio.ch.EPollSelectorImpl.(EPollSelectorImpl.java:49) > at > sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java > :18) > at > org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.get(SocketIOWithT > imeout.java:312) > at > org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWi > thTimeout.java:227) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java: > 155) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:149) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:122) > at java.io.FilterInputStream.read(FilterInputStream.java:116) > at > org.apache.hadoop.ipc.Client$Connection$1.read(Client.java:203) > at > java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at > java.io.BufferedInputStream.read(BufferedInputStream.java:237) > at java.io.DataInputStream.readInt(DataInputStream.java:370) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:289) > > ... > > ... > > Followed by connection errors saying > > "Retrying to connect to server: hadoop-server.com:9000. Already tried > 'n' times". > > (same as above) ... > > .... > > and is retrying constantly (log-writer set up so that it waits and > retries). > > > > Doing an lsof on the log writer java process shows that it got stuck in > a lot of pipe/event poll and eventually ran out of file handles. > > Below is the part of the lsof output > > > > lsof -p 2171 > COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME > .... > > .... > java 2171 root 20r FIFO 0,7 24090207 pipe > java 2171 root 21w FIFO 0,7 24090207 pipe > java 2171 root 22r 0000 0,8 0 24090208 > eventpoll > java 2171 root 23r FIFO 0,7 23323281 pipe > java 2171 root 24r FIFO 0,7 23331536 pipe > java 2171 root 25w FIFO 0,7 23306764 pipe > java 2171 root 26r 0000 0,8 0 23306765 > eventpoll > java 2171 root 27r FIFO 0,7 23262160 pipe > java 2171 root 28w FIFO 0,7 23262160 pipe > java 2171 root 29r 0000 0,8 0 23262161 > eventpoll > java 2171 root 30w FIFO 0,7 23299329 pipe > java 2171 root 31r 0000 0,8 0 23299330 > eventpoll > java 2171 root 32w FIFO 0,7 23331536 pipe > java 2171 root 33r FIFO 0,7 23268961 pipe > java 2171 root 34w FIFO 0,7 23268961 pipe > java 2171 root 35r 0000 0,8 0 23268962 > eventpoll > java 2171 root 36w FIFO 0,7 23314889 pipe > > ... > > ... > > ... > > What in DFS client (if any) could have caused this? Could it be > something else? > > Is it not ideal to use an HDFS writer to directly write logs from Apache > into HDFS? > > Is 'Chukwa" (hadoop log collection and analysis framework contributed by > Yahoo) a better fit for our case? > > > > I would highly appreciate help on any or all of the above questions. > > > > Thanks and Regards > > -Ankur > >