hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Philip Zeyliger (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-621) Exposing MiniDFS and MiniMR clusters as a single process command-line
Date Tue, 15 Sep 2009 22:13:57 GMT

    [ https://issues.apache.org/jira/browse/HDFS-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755735#action_12755735

Philip Zeyliger commented on HDFS-621:


I totally agree that LocalJobRunner should be maximally useful.  That's great for testing

Let's say I have a python class that knows how to interact with HDFS and MR.  It knows how
to look at files, start jobs, etc.  I call out to hadoop binaries to interact with HDFS, and
I want to capture all the details that occur when I talk to my real cluster.  For this, if
I were in Java, I'd spin up a Mini* cluster.  Since I'm not in Java, I resort to spinning
up a subprocess.  I could also mock everything out, but at the end of the day, I want an integration
test, and I really don't want to run it against a cluster that has to be setup externally:
I'd rather the cluster be spun up and shut down by my test itself.

I'm happy to throw this contrib/ if you feel strongly about it.  I figure it'd be useful to
other folks.

-- Philip

> Exposing MiniDFS and MiniMR clusters as a single process command-line
> ---------------------------------------------------------------------
>                 Key: HDFS-621
>                 URL: https://issues.apache.org/jira/browse/HDFS-621
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: test, tools
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Minor
>         Attachments: HDFS-621.patch
> It's hard to test non-Java programs that rely on significant mapreduce functionality.
 The patch I'm proposing shortly will let you just type "bin/hadoop jar hadoop-hdfs-hdfswithmr-test.jar
minicluster" to start a cluster (internally, it's using Mini{MR,HDFS}Cluster) with a specified
number of daemons, etc.  A test that checks how some external process interacts with Hadoop
might start minicluster as a subprocess, run through its thing, and then simply kill the java
> I've been using just such a system for a couple of weeks, and I like it.  It's significantly
easier than developing a lot of scripts to start a pseudo-distributed cluster, and then clean
up after it.  I figure others might find it useful as well.
> I'm at a bit of a loss as to where to put it in 0.21.  hdfs-with-mr tests have all the
required libraries, so I've put it there.  I could conceivably split this into "minimr" and
"minihdfs", but it's specifically the fact that they're configured to talk to each other that
I like about having them together.  And one JVM is better than two for my test programs.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message