hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ramkumar Vadali (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-2283) TestBlockFixer hangs initializing MiniMRCluster
Date Thu, 27 Jan 2011 17:46:45 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987669#action_12987669
] 

Ramkumar Vadali commented on MAPREDUCE-2283:
--------------------------------------------

Update:

If I run ant clean from the top-level and run `ant test -Dtestcase=TestBlockFixer`, it runs
fine.
But if I run ant test-patch from the top level and run it again, it gets stuck. I ran with
test.output=yes to see what was going on, and found this:

{code}
    [junit] 11/01/27 09:21:24 INFO mapred.TaskTracker: TaskTracker up at: localhost.localdomain/127.0.0.1:50197
    [junit] 11/01/27 09:21:24 INFO mapred.TaskTracker: Starting tracker tracker_host0.foo.com:localhost.localdomain/127.0.0.1:50197
    [junit] 11/01/27 09:21:25 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0.
Already tried 0 time(s).
    [junit] 11/01/27 09:21:26 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0.
Already tried 1 time(s).
    [junit] 11/01/27 09:21:27 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0.
Already tried 2 time(s).
    [junit] 11/01/27 09:21:28 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0.
Already tried 3 time(s).
    [junit] 11/01/27 09:21:29 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0.
Already tried 4 time(s).
    [junit] 11/01/27 09:21:30 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0.
Already tried 5 time(s).
    [junit] 11/01/27 09:21:31 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0.
Already tried 6 time(s).
    [junit] 11/01/27 09:21:32 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0.
Already tried 7 time(s).
    [junit] 11/01/27 09:21:33 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0.
Already tried 8 time(s).
    [junit] 11/01/27 09:21:34 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0.
Already tried 9 time(s).
    [junit] 11/01/27 09:21:34 INFO ipc.RPC: Server at localhost/127.0.0.1:0 not available
yet, Zzzzz...
{code}

I think hudson does something like this, and ant test-patch is somehow pulling in a jar that
prevents MiniMRCluster from starting. To check, I wrote a simple test that only tries to start
a MiniMRCluster:

{code}
public class TestStuckMiniMR extends TestCase {
  public static final int NUM_DATANODES = 3;
  Configuration conf;
  String namenode = null;
  MiniDFSCluster dfs = null;
  MiniMRCluster mr = null;
  String jobTrackerName = null;
  FileSystem fileSys = null;
  protected void setUp() throws Exception {

    conf = new Configuration();

    dfs = new MiniDFSCluster(conf, NUM_DATANODES, true, null);
    dfs.waitActive();
    fileSys = dfs.getFileSystem();
    namenode = fileSys.getUri().toString();

    FileSystem.setDefaultUri(conf, namenode);
    mr = new MiniMRCluster(4, namenode, 3);
    jobTrackerName = "localhost:" + mr.getJobTrackerPort();
  }

  protected void tearDown() {
    dfs.shutdown();
    mr.shutdown();
  }

  public void testStuck() throws Exception {
    System.out.println("Done");
  }
}
{code}
This also gets stuck in setup. So I think the problem is outside RAID. Infact, just after
I tried this, I tried running a test under contrib/streaming. That also gets stuck the same
way.

{code}
ant test -Dtestcase=TestFileArgs -Dtest.output=yes
{code}

The output:

{code}
    [junit] 11/01/27 09:42:10 INFO mapred.TaskTracker: TaskTracker up at: localhost.localdomain/127.0.0.1:59339
    [junit] 11/01/27 09:42:10 INFO mapred.TaskTracker: Starting tracker tracker_host0.foo.com:localhost.localdomain/127.0.0.1:59339
    [junit] 11/01/27 09:42:11 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0.
Already tried 0 time(s).
    [junit] 11/01/27 09:42:12 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0.
Already tried 1 time(s).
    [junit] 11/01/27 09:42:13 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:0.
Already tried 2 time(s).
{code}

Can someone try killing TestBlockFixer and run TestFileArgs on the machine thats running hudson?

> TestBlockFixer hangs initializing MiniMRCluster
> -----------------------------------------------
>
>                 Key: MAPREDUCE-2283
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2283
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/raid
>    Affects Versions: 0.23.0
>            Reporter: Nigel Daley
>            Priority: Blocker
>             Fix For: 0.22.0
>
>
> TestBlockFixer (a raid contrib test) is hanging the precommit testing on Hudson

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message