hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xu Zhang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-72) Porting Pig unit tests to use MiniDFSCluster and MiniMRCluster on the local machine
Date Thu, 21 Feb 2008 05:32:43 GMT

    [ https://issues.apache.org/jira/browse/PIG-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12570947#action_12570947
] 

Xu Zhang commented on PIG-72:
-----------------------------

*We might need to make a choice here.*  

I tried with my previous implementation of the unit test framework again.  It does not set
up the mini clusters with setUp() and then shut them down with tearDown() for each test method.
 Instead, it uses a singleton that exists for the duration of the execution of a testcase
class (which BTW is more like a real cluster that always physically exists:-)).  Here is an
example of the usage of the singleton:

{code}
public class TestWhatEver extends TestCase {
    private String initString = "mapreduce";
    private MiniClusterBuilder cluster = MiniClusterBuilder.buildCluster();

    @Test
    public void testCase1() throws Exception { 
        PigServer pig = new PigServer(initString); 

        // Do something with the pig server, such as registering and executing Pig 
        // queries. The queries will executed with the local cluster. 
    }

    @Test
    public void testCase2() throws Exception { 
        PigServer pig = new PigServer(initString); 

        // Do something with the pig server, such as registering and executing Pig 
        // queries. The queries will executed with the local cluster. 
    }

    // More test cases if needed
}
{code}

With this implementation, all present Pig unit tests run successfully without any error and
the total execution time is around 11 minutes on my machine.  

So I would like your opinion on which implementation to use.  The major concern that people
had with the previous implementation is that it uses finalize() to shut down the dfs and mapreduce
clusters.  But because Java guarantees that all finalizers are run on leftover objects when
the Java virtual machine exits, the finalize() method as used in this implementation should
not be an issue.   I am saying this, because as far as I understand, each Junit test case
class is executed in a separate jvm.  So it is to our advantage (such as the efficiency, running
the tests on a local cluster that is more realistic, and less chance for race conditions)
 to start the cluster when the test case class is loaded and then shut it down when the jvm
for the Junit test case class exits.  

FWIW, from the test reports of this implementation it is verified that the cluster is set
up only once for each test case class.  It is also verified that all test case classes use
the same set of resources (such as ports) for the dfs and mapreduce clusters, which means
they are shut down cleanly for each test case class.

Thoughts?

> Porting Pig unit tests to use MiniDFSCluster and MiniMRCluster on the local machine
> -----------------------------------------------------------------------------------
>
>                 Key: PIG-72
>                 URL: https://issues.apache.org/jira/browse/PIG-72
>             Project: Pig
>          Issue Type: Test
>          Components: tools
>            Reporter: Xu Zhang
>         Attachments: hadoop-0.15.3-dev-test-utils.jar, PortPigUnitTestToMiniClusters.patch,
TEST-org.apache.pig.test.TestAlgebraicEval.txt
>
>
> We have the need to port the Pig unit tests to use MiniDFSCluster and MiniMRCluster,
so that tests can be executed with the DFS and MR threads on the local machine.   This feature
will eliminate the need to set up a real distributed hadoop cluster before running the unit
tests, as everything will now be carried out with the (mini) cluster on the user's local machine.
 
> One prerequisite for using this feature is a hadoop jar that has the class files for
MiniDFSCluster, MiniMRCluster and other supporting components.  I have been able to generate
such a jar file with a special target added by myself to hadoop's build.xml and have also
logged a hadoop jira to request this target be a permanent part of that build file.  If possible,
we can just replace hadoop15.jar with this jar file on the SVN source tree and then the users
will never need to worry about the availability of this jar file. Please find such a hadoop
jar file in the attachment.
> To use the feature in unit tests, the user just need to call MiniClusterBuilder.buildCluster()
before a PigServer instance is created with the string "mapreduce" as the parameter to its
constructor.  Here is an example of how the MiniClusterBuilder is used in a test case class:
>         public class TestWhatEver extends TestCase {
> 	        private String initString = "mapreduce";
> 	        private MiniClusterBuilder cluster = MiniClusterBuilder.buildCluster();
> 	
>                 @Test
>                 public void testGroupCountWithMultipleFields() throws Exception {
>                         PigServer pig = new PigServer(initString);
>                         // Do something with the pig server, such as registering and
executing Pig 
>                         // queries. The queries will executed with the local cluster.

>                 }
>      
>                 // More test cases if needed
>         }
> To run the unit tests with the local cluster, under the top directory of the source tree,
issue the command "ant test". Notice that you do not need to specify the location of the hadoop-site.xml
file with the command line option "-Djunit.hadoop.conf=<dir>" anymore. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message