Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 90176 invoked from network); 7 Jun 2007 19:51:48 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 7 Jun 2007 19:51:48 -0000 Received: (qmail 50536 invoked by uid 500); 7 Jun 2007 19:51:51 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 50500 invoked by uid 500); 7 Jun 2007 19:51:51 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 50491 invoked by uid 99); 7 Jun 2007 19:51:51 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Jun 2007 12:51:51 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Jun 2007 12:51:46 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 5A35D7141E0 for ; Thu, 7 Jun 2007 12:51:26 -0700 (PDT) Message-ID: <25549531.1181245886367.JavaMail.jira@brutus> Date: Thu, 7 Jun 2007 12:51:26 -0700 (PDT) From: "Hadoop QA (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-1400) JobClient rpc times out getting job status In-Reply-To: <25042480.1179772756578.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502496 ] Hadoop QA commented on HADOOP-1400: ----------------------------------- +1 http://issues.apache.org/jira/secure/attachment/12359198/job-client-retry-4.patch applied and successfully tested against trunk revision r544740. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/256/testReport/ Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/256/console > JobClient rpc times out getting job status > ------------------------------------------ > > Key: HADOOP-1400 > URL: https://issues.apache.org/jira/browse/HADOOP-1400 > Project: Hadoop > Issue Type: Bug > Components: mapred > Affects Versions: 0.12.3 > Reporter: Nigel Daley > Assignee: Owen O'Malley > Fix For: 0.14.0 > > Attachments: job-client-retry-4.patch, job-client-retry.patch > > > JobClient rpc times out getting job status and logs an INFO exception message. JobClient seems to recover, however, so I think the exception should be logged at DEBUG or the message indicate that JobClient will retry. Here's an example log: > 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029 > 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0% > 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0% > 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0% > 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0% > 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0% > 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0% > 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0% > 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0% > 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0% > 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0% > 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0% > 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0% > 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0% > 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0% > 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1% > 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2% > 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3% > 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4% > 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5% > 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6% > 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7% > 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8% > 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9% > 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10% > 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11% > 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12% > 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13% > 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14% > 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15% > 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16% > 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17% > 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18% > 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19% > 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20% > 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21% > 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22% > 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23% > 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24% > 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25% > 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26% > 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27% > 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28% > 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29% > 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30% > 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31% > 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32% > 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35% > 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: java.net.SocketTimeoutException: timed out waiting for rpc response > at org.apache.hadoop.ipc.Client.call(Client.java:473) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165) > at $Proxy1.getJobStatus(Unknown Source) > at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491) > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556) > at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188) > at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140) > at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:155) > 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97% > 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98% > 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99% > 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100% > 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.