hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Romain Rigaux (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1404) PigUnit - Pig script testing simplified.
Date Fri, 28 May 2010 01:22:37 GMT

    [ https://issues.apache.org/jira/browse/PIG-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12872751#action_12872751
] 

Romain Rigaux commented on PIG-1404:
------------------------------------

Thank you for having a look at it Alan.     
     
Overall it is good but I have been working on a few small simplifications that can be useful
when testing scripts with many tests and big cache archives.

In addition, it might be worth the effort to add some Web documentation about it.

I can update the patch next week.

> PigUnit - Pig script testing simplified. 
> -----------------------------------------
>
>                 Key: PIG-1404
>                 URL: https://issues.apache.org/jira/browse/PIG-1404
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Romain Rigaux
>            Assignee: Romain Rigaux
>             Fix For: 0.8.0
>
>         Attachments: commons-lang-2.4.jar, PIG-1404-2.patch, PIG-1404.patch
>
>
> The goal is to provide a simple xUnit framework that enables our Pig scripts to be easily:
>   - unit tested
>   - regression tested
>   - quickly prototyped
> No cluster set up is required.
> For example:
> TestCase
> {code}
>   @Test
>   public void testTop3Queries() {
>     String[] args = {
>         "n=3",        
>         };
>     test = new PigTest("top_queries.pig", args);
>     String[] input = {
>         "yahoo\t10",
>         "twitter\t7",
>         "facebook\t10",
>         "yahoo\t15",
>         "facebook\t5",
>         ....
>     };
>     String[] output = {
>         "(yahoo,25L)",
>         "(facebook,15L)",
>         "(twitter,7L)",
>     };
>     test.assertOutput("data", input, "queries_limit", output);
>   }
> {code}
> top_queries.pig
> {code}
> data =
>     LOAD '$input'
>     AS (query:CHARARRAY, count:INT);
>      
>     ... 
>     
> queries_sum = 
>     FOREACH queries_group 
>     GENERATE 
>         group AS query, 
>         SUM(queries.count) AS count;
>         
>     ...
>             
> queries_limit = LIMIT queries_ordered $n;
> STORE queries_limit INTO '$output';
> {code}
> They are 3 modes:
> * LOCAL (if "pigunit.exectype.local" properties is present)
> * MAPREDUCE (use the cluster specified in the classpath, same as HADOOP_CONF_DIR)
> ** automatic mini cluster (is the default and the HADOOP_CONF_DIR to have in the class
path will be: ~/pigtest/conf)
> ** pointing to an existing cluster (if "pigunit.exectype.cluster" properties is present)
> For now, it would be nice to see how this idea could be integrated in Piggybank and if
PigParser/PigServer could improve their interfaces in order to make PigUnit simple.
> Other components based on PigUnit could be built later:
>   - standalone MiniCluster
>   - notion of workspaces for each test
>   - standalone utility that reads test configuration and generates a test report...
> It is a first prototype, open to suggestions and can definitely take advantage of feedbacks.
> How to test, in pig_trunk:
> {code}
> Apply patch
> $pig_trunk ant compile-test
> $pig_trunk ant
> $pig_trunk/contrib/piggybank/java ant test -Dtest.timeout=999999
> {code}
> (it takes 15 min in MAPREDUCE minicluster, tests will need to be split in the future
between 'unit' and 'integration')
> Many examples are in:
> {code}
> contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/pigunit/TestPigTest.java
> {code}
> When used as a standalone, do not forget commons-lang-2.4.jar and the HADOOP_CONF_DIR
to your cluster in your CLASSPATH.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message