Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 17043 invoked from network); 28 Aug 2007 21:59:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 28 Aug 2007 21:59:58 -0000 Received: (qmail 89551 invoked by uid 500); 28 Aug 2007 21:59:52 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 89518 invoked by uid 500); 28 Aug 2007 21:59:52 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 89382 invoked by uid 99); 28 Aug 2007 21:59:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Aug 2007 14:59:51 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Aug 2007 22:00:47 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 3A88E714201 for ; Tue, 28 Aug 2007 14:59:31 -0700 (PDT) Message-ID: <27099264.1188338371237.JavaMail.jira@brutus> Date: Tue, 28 Aug 2007 14:59:31 -0700 (PDT) From: "Doug Cutting (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Updated: (HADOOP-1739) ConnectException in TaskTracker Child In-Reply-To: <18467997.1187648970596.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Cutting updated HADOOP-1739: --------------------------------- Attachment: HADOOP-1739_3.patch Here's a new version that: 1. removes user-specification of the umbilical port, always letting the OS choose 2. changes the default umbilical address to 127.0.0.1 3. removes the @port@ option from mapred.child.jvm.opts 4. puts both the parent address and port on the child command line, so that the child no longer relies on the config file to get the parent's address. Question: does removing @port@ break anyone? > ConnectException in TaskTracker Child > ------------------------------------- > > Key: HADOOP-1739 > URL: https://issues.apache.org/jira/browse/HADOOP-1739 > Project: Hadoop > Issue Type: Bug > Components: mapred > Affects Versions: 0.14.0 > Environment: Version: 0.15.0-dev, r565628 > Compiled: Tue Aug 14 20:55:37 UTC 2007 by hadoopqa > 1400 nodes > Reporter: Srikanth Kakani > Assignee: Arun C Murthy > Fix For: 0.15.0 > > Attachments: HADOOP-1739_1_20070823.patch, HADOOP-1739_2_20070825.patch, HADOOP-1739_3.patch > > > Steps to Reproduce: > I had 11000 mappers and 2700 reducers in a job and most failures correspond to the following logs: > Stderr: > Exception in thread "main" java.net.ConnectException: Connection refused > at java.net.PlainSocketImpl.socketConnect(Native Method) > at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) > at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:193) > at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) > at java.net.Socket.connect(Socket.java:519) > at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:150) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:530) > at org.apache.hadoop.ipc.Client.call(Client.java:459) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165) > at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source) > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:248) > at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1781) > Syslog: > 2007-08-19 18:45:07,490 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:50051. Already tried 1 time(s). > 2007-08-19 18:45:08,494 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:50051. Already tried 2 time(s). > 2007-08-19 18:45:09,497 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:50051. Already tried 3 time(s). > 2007-08-19 18:45:10,500 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:50051. Already tried 4 time(s). > 2007-08-19 18:45:11,503 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:50051. Already tried 5 time(s). > 2007-08-19 18:45:12,506 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:50051. Already tried 6 time(s). > 2007-08-19 18:45:13,508 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:50051. Already tried 7 time(s). > 2007-08-19 18:45:14,511 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:50051. Already tried 8 time(s). > 2007-08-19 18:45:15,512 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:50051. Already tried 9 time(s). > 2007-08-19 18:45:16,515 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:50051. Already tried 10 time(s) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.