hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1404) PigUnit - Pig script testing simplified.
Date Fri, 07 May 2010 22:26:50 GMT

    [ https://issues.apache.org/jira/browse/PIG-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865340#action_12865340
] 

Alan Gates commented on PIG-1404:
---------------------------------

This looks really cool.  All the examples of how to use it are very nice.  I have a few questions:

# It looks like commons.lang.StringUtils can be pulled from maven, so we'll want to add that
to the ivy files.
# I don't understand what the purpose of the re-implementation of GruntParser is.  Could you
explain that a bit?
# (This one is for other pig developers) Is Piggybank the right place for this or should we
put it under test?  I think this will be really useful for Pig users in setting up automated
tests of their Pig Latin scripts.  Should we support it outright rather than put it in piggybank
and risk having it go unmaintained?

> PigUnit - Pig script testing simplified. 
> -----------------------------------------
>
>                 Key: PIG-1404
>                 URL: https://issues.apache.org/jira/browse/PIG-1404
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Romain Rigaux
>             Fix For: 0.8.0
>
>         Attachments: commons-lang-2.4.jar, PIG-1404.patch
>
>
> The goal is to provide a simple xUnit framework that enables our Pig scripts to be easily:
>   - unit tested
>   - regression tested
>   - quickly prototyped
> No cluster set up is required.
> For example:
> TestCase
> {code}
>   @Test
>   public void testTop3Queries() {
>     String[] args = {
>         "n=3",        
>         };
>     test = new PigTest("top_queries.pig", args);
>     String[] input = {
>         "yahoo\t10",
>         "twitter\t7",
>         "facebook\t10",
>         "yahoo\t15",
>         "facebook\t5",
>         ....
>     };
>     String[] output = {
>         "(yahoo,25L)",
>         "(facebook,15L)",
>         "(twitter,7L)",
>     };
>     test.assertOutput("data", input, "queries_limit", output);
>   }
> {code}
> top_queries.pig
> {code}
> data =
>     LOAD '$input'
>     AS (query:CHARARRAY, count:INT);
>      
>     ... 
>     
> queries_sum = 
>     FOREACH queries_group 
>     GENERATE 
>         group AS query, 
>         SUM(queries.count) AS count;
>         
>     ...
>             
> queries_limit = LIMIT queries_ordered $n;
> STORE queries_limit INTO '$output';
> {code}
> They are 3 modes:
> * LOCAL (if "pigunit.exectype.local" properties is present)
> * MAPREDUCE (use the cluster specified in the classpath, same as HADOOP_CONF_DIR)
> ** automatic mini cluster (is the default and the HADOOP_CONF_DIR to have in the class
path will be: ~/pigtest/conf)
> ** pointing to an existing cluster (if "pigunit.exectype.cluster" properties is present)
> For now, it would be nice to see how this idea could be integrated in Piggybank and if
PigParser/PigServer could improve their interfaces in order to make PigUnit simple.
> Other components based on PigUnit could be built later:
>   - standalone MiniCluster
>   - notion of workspaces for each test
>   - standalone utility that reads test configuration and generates a test report...
> It is a first prototype, open to suggestions and can definitely take advantage of feedbacks.
> How to test, in pig_trunk:
> {code}
> Apply patch
> $pig_trunk ant compile-test
> $pig_trunk ant
> $pig_trunk/contrib/piggybank/java ant test -Dtest.timeout=999999
> {code}
> (it takes 15 min in MAPREDUCE minicluster, tests will need to be split in the future
between 'unit' and 'integration')
> Many examples are in:
> {code}
> contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/pigunit/TestPigTest.java
> {code}
> When used as a standalone, do not forget commons-lang-2.4.jar and the HADOOP_CONF_DIR
to your cluster in your CLASSPATH.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message