Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 54235 invoked from network); 14 Jun 2006 22:42:47 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 14 Jun 2006 22:42:47 -0000 Received: (qmail 68184 invoked by uid 500); 14 Jun 2006 22:42:45 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 68133 invoked by uid 500); 14 Jun 2006 22:42:45 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 68113 invoked by uid 99); 14 Jun 2006 22:42:44 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Jun 2006 15:42:44 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [209.237.227.198] (HELO brutus.apache.org) (209.237.227.198) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Jun 2006 15:42:44 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 902A8714204 for ; Wed, 14 Jun 2006 22:41:30 +0000 (GMT) Message-ID: <10825379.1150324890587.JavaMail.jira@brutus> Date: Wed, 14 Jun 2006 22:41:30 +0000 (GMT+00:00) From: "Doug Cutting (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Reopened: (HADOOP-210) Namenode not able to accept connections In-Reply-To: <22832248.1147306205708.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ http://issues.apache.org/jira/browse/HADOOP-210?page=all ] Doug Cutting reopened HADOOP-210: --------------------------------- Assign To: Sameer Paranjpye (was: Mahadev konar) I reverted this for now, since it (for unknown reasons) seemed to break distributed operation. > Namenode not able to accept connections > --------------------------------------- > > Key: HADOOP-210 > URL: http://issues.apache.org/jira/browse/HADOOP-210 > Project: Hadoop > Type: Bug > Components: dfs > Environment: linux > Reporter: Mahadev konar > Assignee: Sameer Paranjpye > Fix For: 0.4.0 > Attachments: nio.patch, nio.patch > > I am running owen's random writer on a 627 node cluster (writing 10GB/node). After running for a while (map 12% reduce 1%) I get the following error on the Namenode: > Exception in thread "Server listener on port 60000" java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:574) > at org.apache.hadoop.ipc.Server$Listener.run(Server.java:105) > After this, the namenode does not seem to be accepting connections from any of the clients. All the DFSClient calls get timeout. Here is a trace for one of them: > java.net.SocketTimeoutException: timed out waiting for rpc response > at org.apache.hadoop.ipc.Client.call(Client.java:305) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:149) > at org.apache.hadoop.dfs.$Proxy1.open(Unknown Source) > at org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:419) > at org.apache.hadoop.dfs.DFSClient$DFSInputStream.(DFSClient.java:406) > at org.apache.hadoop.dfs.DFSClient.open(DFSClient.java:171) > at org.apache.hadoop.dfs.DistributedFileSystem.openRaw(DistributedFileSystem.java:78) > at org.apache.hadoop.fs.FSDataInputStream$Checker.(FSDataInputStream.java:46) > at org.apache.hadoop.fs.FSDataInputStream.(FSDataInputStream.java:228) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:157) > at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:43) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:105) > at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:785). > The namenode then has around 1% CPU utilization at this time (after the outofmemory exception has been thrown). I have profiled the NameNode and it seems to be using around a maixmum heap size of 57MB (which is not much). So, heap size does not seem to be a problem. It might be happening due to lack of Stack space? Any pointers? -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira