hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Romain Rigaux (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-1404) PigUnit - Pig script testing simplified.
Date Mon, 03 May 2010 23:06:56 GMT
PigUnit - Pig script testing simplified. 

                 Key: PIG-1404
                 URL: https://issues.apache.org/jira/browse/PIG-1404
             Project: Pig
          Issue Type: New Feature
            Reporter: Romain Rigaux
             Fix For: 0.8.0

The goal is to provide a simple xUnit framework that enables our Pig scripts to be easily:
  - unit tested
  - regression tested
  - quickly prototyped

For example:

  public void testTop3Queries() {
    String[] args = {
    test = new PigTest("top_queries.pig", args);

    String[] input = {

    String[] output = {

    test.assertOutput("data", input, "queries_limit", output);

data =
    LOAD '$input'
    AS (query:CHARARRAY, count:INT);
queries_sum = 
    FOREACH queries_group 
        group AS query, 
        SUM(queries.count) AS count;
queries_limit = LIMIT queries_ordered $n;

STORE queries_limit INTO '$output';

They are 3 modes:
   - LOCAL (if "pigunit.exectype.local" properties is present)
   - MAPREDUCE (use the cluster specified in the classpath, same as HADOOP_CONF_DIR)
     - automatic mini cluster (default)
     - pointing to an existing cluster (if "pigunit.exectype.cluster" properties is present)

For now, it would be nice to see how this idea could be integrated in Piggybank and if PigParser/PigServer
could improve their interfaces in order to make PigUnit simple.

Other components based on PigUnit could be built later:
  - standalone MiniCluster
  - notion of workspaces for each test
  - standalone utility that reads test configuration and generates a test report...

It is a first prototype, open to suggestions and can definitely take advantage of feedbacks.

How to test, in pig_trunk
Apply patch
$pig_trunk ant
$pig_trunk/contrib/piggybank/java ant test -Dtest.timeout=999999

(it takes 15 min in MAPREDUCE minicluster, tests will need to be split in the future between
'unit' and 'integration')

Many examples are in:

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message