hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Philip Zeyliger (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-621) Exposing MiniDFS and MiniMR clusters as a single process command-line
Date Tue, 15 Sep 2009 22:13:57 GMT

    [ https://issues.apache.org/jira/browse/HDFS-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755735#action_12755735
] 

Philip Zeyliger commented on HDFS-621:
--------------------------------------

Owen,

I totally agree that LocalJobRunner should be maximally useful.  That's great for testing
jobs.

Let's say I have a python class that knows how to interact with HDFS and MR.  It knows how
to look at files, start jobs, etc.  I call out to hadoop binaries to interact with HDFS, and
I want to capture all the details that occur when I talk to my real cluster.  For this, if
I were in Java, I'd spin up a Mini* cluster.  Since I'm not in Java, I resort to spinning
up a subprocess.  I could also mock everything out, but at the end of the day, I want an integration
test, and I really don't want to run it against a cluster that has to be setup externally:
I'd rather the cluster be spun up and shut down by my test itself.

I'm happy to throw this contrib/ if you feel strongly about it.  I figure it'd be useful to
other folks.

-- Philip

> Exposing MiniDFS and MiniMR clusters as a single process command-line
> ---------------------------------------------------------------------
>
>                 Key: HDFS-621
>                 URL: https://issues.apache.org/jira/browse/HDFS-621
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: test, tools
>            Reporter: Philip Zeyliger
>            Assignee: Philip Zeyliger
>            Priority: Minor
>         Attachments: HDFS-621.patch
>
>
> It's hard to test non-Java programs that rely on significant mapreduce functionality.
 The patch I'm proposing shortly will let you just type "bin/hadoop jar hadoop-hdfs-hdfswithmr-test.jar
minicluster" to start a cluster (internally, it's using Mini{MR,HDFS}Cluster) with a specified
number of daemons, etc.  A test that checks how some external process interacts with Hadoop
might start minicluster as a subprocess, run through its thing, and then simply kill the java
subprocess.
> I've been using just such a system for a couple of weeks, and I like it.  It's significantly
easier than developing a lot of scripts to start a pseudo-distributed cluster, and then clean
up after it.  I figure others might find it useful as well.
> I'm at a bit of a loss as to where to put it in 0.21.  hdfs-with-mr tests have all the
required libraries, so I've put it there.  I could conceivably split this into "minimr" and
"minihdfs", but it's specifically the fact that they're configured to talk to each other that
I like about having them together.  And one JVM is better than two for my test programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message