hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Viraj Bhat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1211) Pig script runs half way after which it reports syntax error
Date Fri, 23 Apr 2010 23:30:50 GMT

    [ https://issues.apache.org/jira/browse/PIG-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860419#action_12860419
] 

Viraj Bhat commented on PIG-1211:
---------------------------------

Ashutosh, I feel that the user may not be interested in running his script first using explain
finding his syntax error and then again running it again to get his results.  
They expect Pig to tell them all the errors upfront before submitting a M/R job.

Explain was not designed for checking syntax error in scripts. 

I believe that if you have a dump statement, explain -script will cause the script to run.

Is it not possible for Pig to find out that there is an error with "store" syntax? 

Viraj

> Pig script runs half way after which it reports syntax error
> ------------------------------------------------------------
>
>                 Key: PIG-1211
>                 URL: https://issues.apache.org/jira/browse/PIG-1211
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Viraj Bhat
>             Fix For: 0.8.0
>
>
> I have a Pig script which is structured in the following way
> {code}
> register cp.jar
> dataset = load '/data/dataset/' using PigStorage('\u0001') as (col1, col2, col3, col4,
col5);
> filtered_dataset = filter dataset by (col1 == 1);
> proj_filtered_dataset = foreach filtered_dataset generate col2, col3;
> rmf $output1;
> store proj_filtered_dataset into '$output1' using PigStorage();
> second_stream = foreach filtered_dataset  generate col2, col4, col5;
> group_second_stream = group second_stream by col4;
> output2 = foreach group_second_stream {
>  a =  second_stream.col2
>  b =   distinct second_stream.col5;
>  c = order b by $0;
>  generate 1 as key, group as keyword, MYUDF(c, 100) as finalcalc;
> }
> rmf  $output2;
> --syntax error here
> store output2 to '$output2' using PigStorage();
> {code}
> I run this script using the Multi-query option, it runs successfully till the first store
but later fails with a syntax error. 
> The usage of HDFS option, "rmf" causes the first store to execute. 
> The only option the I have is to run an explain before running his script 
> grunt> explain -script myscript.pig -out explain.out
> or moving the rmf statements to the top of the script
> Here are some questions:
> a) Can we have an option to do something like "checkscript" instead of explain to get
the same syntax error?  In this way I can ensure that I do not run for 3-4 hours before encountering
a syntax error
> b) Can pig not figure out a way to re-order the rmf statements since all the store directories
are variables
> Thanks
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message