Return-Path: Delivered-To: apmail-hadoop-pig-dev-archive@www.apache.org Received: (qmail 98331 invoked from network); 4 Aug 2010 16:50:40 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 4 Aug 2010 16:50:40 -0000 Received: (qmail 12868 invoked by uid 500); 4 Aug 2010 16:50:40 -0000 Delivered-To: apmail-hadoop-pig-dev-archive@hadoop.apache.org Received: (qmail 12826 invoked by uid 500); 4 Aug 2010 16:50:39 -0000 Mailing-List: contact pig-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: pig-dev@hadoop.apache.org Delivered-To: mailing list pig-dev@hadoop.apache.org Received: (qmail 12816 invoked by uid 99); 4 Aug 2010 16:50:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Aug 2010 16:50:39 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Aug 2010 16:50:38 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o74GoI0e001087 for ; Wed, 4 Aug 2010 16:50:18 GMT Message-ID: <25667997.160861280940618829.JavaMail.jira@thor> Date: Wed, 4 Aug 2010 12:50:18 -0400 (EDT) From: "Ashutosh Chauhan (JIRA)" To: pig-dev@hadoop.apache.org Subject: [jira] Commented: (PIG-1404) PigUnit - Pig script testing simplified. In-Reply-To: <237272.24971272928016581.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/PIG-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895318#action_12895318 ] Ashutosh Chauhan commented on PIG-1404: --------------------------------------- bq. 3. (This one is for other pig developers) Is Piggybank the right place for this or should we put it under test? I think this will be really useful for Pig users in setting up automated tests of their Pig Latin scripts. Should we support it outright rather than put it in piggybank and risk having it go unmaintained? I think it deserves to be put in under test. Having written few end-to-end test cases of pig in junit, I can see its really useful for Pig itself. Usefulness for pig users is pretty obvious. > PigUnit - Pig script testing simplified. > ----------------------------------------- > > Key: PIG-1404 > URL: https://issues.apache.org/jira/browse/PIG-1404 > Project: Pig > Issue Type: New Feature > Reporter: Romain Rigaux > Assignee: Romain Rigaux > Fix For: 0.8.0 > > Attachments: commons-lang-2.4.jar, PIG-1404-2.patch, PIG-1404-3-doc.patch, PIG-1404-3.patch, PIG-1404-4-doc.patch, PIG-1404-4.patch, PIG-1404.patch > > > The goal is to provide a simple xUnit framework that enables our Pig scripts to be easily: > - unit tested > - regression tested > - quickly prototyped > No cluster set up is required. > For example: > TestCase > {code} > @Test > public void testTop3Queries() { > String[] args = { > "n=3", > }; > test = new PigTest("top_queries.pig", args); > String[] input = { > "yahoo\t10", > "twitter\t7", > "facebook\t10", > "yahoo\t15", > "facebook\t5", > .... > }; > String[] output = { > "(yahoo,25L)", > "(facebook,15L)", > "(twitter,7L)", > }; > test.assertOutput("data", input, "queries_limit", output); > } > {code} > top_queries.pig > {code} > data = > LOAD '$input' > AS (query:CHARARRAY, count:INT); > > ... > > queries_sum = > FOREACH queries_group > GENERATE > group AS query, > SUM(queries.count) AS count; > > ... > > queries_limit = LIMIT queries_ordered $n; > STORE queries_limit INTO '$output'; > {code} > They are 3 modes: > * LOCAL (if "pigunit.exectype.local" properties is present) > * MAPREDUCE (use the cluster specified in the classpath, same as HADOOP_CONF_DIR) > ** automatic mini cluster (is the default and the HADOOP_CONF_DIR to have in the class path will be: ~/pigtest/conf) > ** pointing to an existing cluster (if "pigunit.exectype.cluster" properties is present) > For now, it would be nice to see how this idea could be integrated in Piggybank and if PigParser/PigServer could improve their interfaces in order to make PigUnit simple. > Other components based on PigUnit could be built later: > - standalone MiniCluster > - notion of workspaces for each test > - standalone utility that reads test configuration and generates a test report... > It is a first prototype, open to suggestions and can definitely take advantage of feedbacks. > How to test, in pig_trunk: > {code} > Apply patch > $pig_trunk ant compile-test > $pig_trunk ant > $pig_trunk/contrib/piggybank/java ant test -Dtest.timeout=999999 > {code} > (it takes 15 min in MAPREDUCE minicluster, tests will need to be split in the future between 'unit' and 'integration') > Many examples are in: > {code} > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/pigunit/TestPigTest.java > {code} > When used as a standalone, do not forget commons-lang-2.4.jar and the HADOOP_CONF_DIR to your cluster in your CLASSPATH. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.