Return-Path: Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: (qmail 59747 invoked from network); 27 Jan 2011 17:47:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 27 Jan 2011 17:47:09 -0000 Received: (qmail 80850 invoked by uid 500); 27 Jan 2011 17:47:09 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 80714 invoked by uid 500); 27 Jan 2011 17:47:07 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 80706 invoked by uid 99); 27 Jan 2011 17:47:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Jan 2011 17:47:06 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Jan 2011 17:47:05 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id p0RHkj5a001071 for ; Thu, 27 Jan 2011 17:46:45 GMT Message-ID: <28184455.244241296150405247.JavaMail.jira@thor> Date: Thu, 27 Jan 2011 12:46:45 -0500 (EST) From: "Ramkumar Vadali (JIRA)" To: mapreduce-issues@hadoop.apache.org Subject: [jira] Commented: (MAPREDUCE-2283) TestBlockFixer hangs initializing MiniMRCluster In-Reply-To: <20581206.220981296065204413.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987669#action_12987669 ] Ramkumar Vadali commented on MAPREDUCE-2283: -------------------------------------------- Update: If I run ant clean from the top-level and run `ant test -Dtestcase=TestBlockFixer`, it runs fine. But if I run ant test-patch from the top level and run it again, it gets stuck. I ran with test.output=yes to see what was going on, and found this: {code} [junit] 11/01/27 09:21:24 INFO mapred.TaskTracker: TaskTracker up at: localhost.localdomain/127.0.0.1:50197 [junit] 11/01/27 09:21:24 INFO mapred.TaskTracker: Starting tracker tracker_host0.foo.com:localhost.localdomain/127.0.0.1:50197 [junit] 11/01/27 09:21:25 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 0 time(s). [junit] 11/01/27 09:21:26 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 1 time(s). [junit] 11/01/27 09:21:27 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 2 time(s). [junit] 11/01/27 09:21:28 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 3 time(s). [junit] 11/01/27 09:21:29 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 4 time(s). [junit] 11/01/27 09:21:30 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 5 time(s). [junit] 11/01/27 09:21:31 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 6 time(s). [junit] 11/01/27 09:21:32 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 7 time(s). [junit] 11/01/27 09:21:33 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 8 time(s). [junit] 11/01/27 09:21:34 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 9 time(s). [junit] 11/01/27 09:21:34 INFO ipc.RPC: Server at localhost/127.0.0.1:0 not available yet, Zzzzz... {code} I think hudson does something like this, and ant test-patch is somehow pulling in a jar that prevents MiniMRCluster from starting. To check, I wrote a simple test that only tries to start a MiniMRCluster: {code} public class TestStuckMiniMR extends TestCase { public static final int NUM_DATANODES = 3; Configuration conf; String namenode = null; MiniDFSCluster dfs = null; MiniMRCluster mr = null; String jobTrackerName = null; FileSystem fileSys = null; protected void setUp() throws Exception { conf = new Configuration(); dfs = new MiniDFSCluster(conf, NUM_DATANODES, true, null); dfs.waitActive(); fileSys = dfs.getFileSystem(); namenode = fileSys.getUri().toString(); FileSystem.setDefaultUri(conf, namenode); mr = new MiniMRCluster(4, namenode, 3); jobTrackerName = "localhost:" + mr.getJobTrackerPort(); } protected void tearDown() { dfs.shutdown(); mr.shutdown(); } public void testStuck() throws Exception { System.out.println("Done"); } } {code} This also gets stuck in setup. So I think the problem is outside RAID. Infact, just after I tried this, I tried running a test under contrib/streaming. That also gets stuck the same way. {code} ant test -Dtestcase=TestFileArgs -Dtest.output=yes {code} The output: {code} [junit] 11/01/27 09:42:10 INFO mapred.TaskTracker: TaskTracker up at: localhost.localdomain/127.0.0.1:59339 [junit] 11/01/27 09:42:10 INFO mapred.TaskTracker: Starting tracker tracker_host0.foo.com:localhost.localdomain/127.0.0.1:59339 [junit] 11/01/27 09:42:11 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 0 time(s). [junit] 11/01/27 09:42:12 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 1 time(s). [junit] 11/01/27 09:42:13 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0. Already tried 2 time(s). {code} Can someone try killing TestBlockFixer and run TestFileArgs on the machine thats running hudson? > TestBlockFixer hangs initializing MiniMRCluster > ----------------------------------------------- > > Key: MAPREDUCE-2283 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2283 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/raid > Affects Versions: 0.23.0 > Reporter: Nigel Daley > Priority: Blocker > Fix For: 0.22.0 > > > TestBlockFixer (a raid contrib test) is hanging the precommit testing on Hudson -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.