Return-Path: Delivered-To: apmail-hadoop-pig-dev-archive@www.apache.org Received: (qmail 23093 invoked from network); 11 May 2010 19:34:06 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 11 May 2010 19:34:06 -0000 Received: (qmail 82342 invoked by uid 500); 11 May 2010 19:34:06 -0000 Delivered-To: apmail-hadoop-pig-dev-archive@hadoop.apache.org Received: (qmail 82320 invoked by uid 500); 11 May 2010 19:34:06 -0000 Mailing-List: contact pig-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: pig-dev@hadoop.apache.org Delivered-To: mailing list pig-dev@hadoop.apache.org Received: (qmail 82312 invoked by uid 99); 11 May 2010 19:34:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 May 2010 19:34:06 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 May 2010 19:34:04 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o4BJXgsi010631 for ; Tue, 11 May 2010 19:33:42 GMT Message-ID: <14174511.10101273606422609.JavaMail.jira@thor> Date: Tue, 11 May 2010 15:33:42 -0400 (EDT) From: "Romain Rigaux (JIRA)" To: pig-dev@hadoop.apache.org Subject: [jira] Commented: (PIG-1404) PigUnit - Pig script testing simplified. In-Reply-To: <237272.24971272928016581.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/PIG-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866291#action_12866291 ] Romain Rigaux commented on PIG-1404: ------------------------------------ 1. It looks like commons.lang.StringUtils can be pulled from maven, so we'll want to add that to the ivy files. There is already the apache StringUtils in the hive.jar of piggybank, I included the commons-lang.jar in case people missed it. Is it ok to add commons-lang.jar in the ivy of Pig even if it is only used by piggybank? 2. I don't understand what the purpose of the re-implementation of GruntParser is. Could you explain that a bit? The plan of the Pig script needs to be modified in several cases, for example: * removing of DUMP or STORE that interfere with the execution of the script. * using a customized array of text data in input instead of another data file, cf. _TestPigTest#testTextInput()_: {code} A = LOAD 'input_data'; --> A = LOAD 'text_data_saved_into_a_file'; {code} * test a subset of the pigscript, need to override the first alias, cf. _TestPigTest#testSubset()_: {code} queries = FOREACH data GENERATE LOWER(query) AS query, count AS count; --> queries = LOAD 'XX.tmp' AS (query:CHARARRAY, count:INTEGER); {code} In order to do this, I did not find a simpler solution than parsing the script and overriding the void processPig(String cmd) of the PigParser: It takes a list of aliases to override: {code} overrides = { STORE: "", queries_limit: queries_limit = LIMIT queries_ordered 5; } {code} Then during the parsing it replaces the aliases by the new values: {code} for each command in the PigScript: does the command have an alias to override?: command = overrides.get(alias) {code} Is there a simpler way to replace some parts of the plan? > PigUnit - Pig script testing simplified. > ----------------------------------------- > > Key: PIG-1404 > URL: https://issues.apache.org/jira/browse/PIG-1404 > Project: Pig > Issue Type: New Feature > Reporter: Romain Rigaux > Fix For: 0.8.0 > > Attachments: commons-lang-2.4.jar, PIG-1404.patch, PIG-1404.patch > > > The goal is to provide a simple xUnit framework that enables our Pig scripts to be easily: > - unit tested > - regression tested > - quickly prototyped > No cluster set up is required. > For example: > TestCase > {code} > @Test > public void testTop3Queries() { > String[] args = { > "n=3", > }; > test = new PigTest("top_queries.pig", args); > String[] input = { > "yahoo\t10", > "twitter\t7", > "facebook\t10", > "yahoo\t15", > "facebook\t5", > .... > }; > String[] output = { > "(yahoo,25L)", > "(facebook,15L)", > "(twitter,7L)", > }; > test.assertOutput("data", input, "queries_limit", output); > } > {code} > top_queries.pig > {code} > data = > LOAD '$input' > AS (query:CHARARRAY, count:INT); > > ... > > queries_sum = > FOREACH queries_group > GENERATE > group AS query, > SUM(queries.count) AS count; > > ... > > queries_limit = LIMIT queries_ordered $n; > STORE queries_limit INTO '$output'; > {code} > They are 3 modes: > * LOCAL (if "pigunit.exectype.local" properties is present) > * MAPREDUCE (use the cluster specified in the classpath, same as HADOOP_CONF_DIR) > ** automatic mini cluster (is the default and the HADOOP_CONF_DIR to have in the class path will be: ~/pigtest/conf) > ** pointing to an existing cluster (if "pigunit.exectype.cluster" properties is present) > For now, it would be nice to see how this idea could be integrated in Piggybank and if PigParser/PigServer could improve their interfaces in order to make PigUnit simple. > Other components based on PigUnit could be built later: > - standalone MiniCluster > - notion of workspaces for each test > - standalone utility that reads test configuration and generates a test report... > It is a first prototype, open to suggestions and can definitely take advantage of feedbacks. > How to test, in pig_trunk: > {code} > Apply patch > $pig_trunk ant compile-test > $pig_trunk ant > $pig_trunk/contrib/piggybank/java ant test -Dtest.timeout=999999 > {code} > (it takes 15 min in MAPREDUCE minicluster, tests will need to be split in the future between 'unit' and 'integration') > Many examples are in: > {code} > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/pigunit/TestPigTest.java > {code} > When used as a standalone, do not forget commons-lang-2.4.jar and the HADOOP_CONF_DIR to your cluster in your CLASSPATH. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.