hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "George Mavromatis (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-770) Parsing errors with FOREACH a GENERATE FLATTEN(urlContents) AS
Date Sat, 18 Apr 2009 02:22:14 GMT

    [ https://issues.apache.org/jira/browse/PIG-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700413#action_12700413
] 

George Mavromatis commented on PIG-770:
---------------------------------------

I omitted the cogroup statement, sorry, which is:

a = COGROUP urlContents BY url INNER, siteUrls BY url INNER;

Note that in my sample, siteUrls::url is chararray, not byterray, and that urlContents is
loaded using BinStorage().

Can you reproduce with the above changes?

> Parsing errors with FOREACH a GENERATE FLATTEN(urlContents) AS
> --------------------------------------------------------------
>
>                 Key: PIG-770
>                 URL: https://issues.apache.org/jira/browse/PIG-770
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.2.0
>            Reporter: George Mavromatis
>
> Loading the 2 following as:
> urlContents = LOAD '$input' USING BinStorage() AS (url:chararray, pg:bytearray);
> siteUrls = LOAD '$siteUrls' AS (site:chararray, score:double, expanded_site:chararray,
url:bytearray);
> then the following:
> urlContentsByUrl = FOREACH a GENERATE FLATTEN(urlContents) AS (url:chararray, pg:chararray),
>                                       FLATTEN(siteUrls.(site, expanded_site));
> works as expected.
> But all the rest fail with an error message that does not make sense (to me)
> urlContentsByUrl = FOREACH a GENERATE FLATTEN(urlContents) AS (url:chararray, pg:chararray),
>                                      FLATTEN(siteUrls.site);
> 2009-04-17 23:18:02,064 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error
during parsing. Invalid alias: siteUrls::site in {url: chararray,pg: chararray,site: chararray}
> urlContentsByUrl = FOREACH a GENERATE FLATTEN(urlContents) AS (url:chararray, pg:chararray),
>                                      FLATTEN(siteUrls.(site));
> 2009-04-17 23:19:27,669 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error
during parsing. Invalid alias: siteUrls::site in {url: chararray,pg: chararray,site: chararray}
> urlContentsByUrl = FOREACH a GENERATE FLATTEN(urlContents) AS (url:chararray, pg:chararray),
>                                       FLATTEN(siteUrls.(site,expanded_site)) AS (site:chararray,expanded_site:chararray);
> 2009-04-17 23:23:33,483 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error
during parsing. Invalid alias: siteUrls::site in {url: chararray,pg: chararray,site: chararray,expanded_site:
chararray}
> Even if I do not use the AS correctly with FLATTEN, then all or none of the above should
parse, so either way this is a parsing bug.
> Note that in the pig latin spec page, there is no formal description of FLATTEN operation
and no example where it is used with GENERATE, AS and a bag of more than one tuples, so really
I can't know if my above syntax is supported, but try and guess. Should I file a separate
ticket on that?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message