hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (PIG-768) Schema of a relation reported by DESCRIBE and allowed operations on the relation are not compatible
Date Mon, 13 Sep 2010 22:15:33 GMT

     [ https://issues.apache.org/jira/browse/PIG-768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alan Gates resolved PIG-768.
----------------------------

    Resolution: Not A Problem

This is the way Pig is supposed to work.  If the loader or the user does not tell it what
type a column is, it assumes that it is bytearray.  If later the script acts as if it is a
certain type (by for example, applying the map dereference operator), then Pig assumes it
is really of that type and casts it.

You are right that the loader would do better to return it as a bytearray and then cast it
later when Pig asks it to.  However, since casts of a type to the same type work, what the
loader does works out.

> Schema of a relation reported by DESCRIBE and allowed operations on the relation are
not compatible
> ---------------------------------------------------------------------------------------------------
>
>                 Key: PIG-768
>                 URL: https://issues.apache.org/jira/browse/PIG-768
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>            Reporter: George Mavromatis
>             Fix For: 0.9.0
>
>
> The DESCIBE command in the following script  prints:
> {s: bytearray, pg: bytearray, wm: bytearray}
> However, the script later treats the s field of urlMap as a map instead of a bytearray,
as shown in s#'Url'.
> Pig does not complain about this contradiction and at execution time, the s field is
treated as hash, although it was reported as byterray at parse time.
> Pig should either not report s as a byterray or exit with a parsing error.
> Note that all above operations happen before the query executes at the cluster.
> register WebDataProcessing.jar; 
> register opencrawl.jar; 
> urlMap = LOAD '$input' USING opencrawl.pigudf.WebDataLoader() AS (s, pg, wm);
> DESCRIBE urlMap;
> -- in fact the loader in the WebDataProcessing.jar populates s and pg as s:map[], pg:bag{t1:(contents:bytearray)}
> -- and defines that in determineSchema() but pig describe ignores it!
> urlMap2 = LIMIT urlMap 20;
> urlList2 = FOREACH urlMap2 GENERATE s#'Url', pg;
> DESCRIBE urlList2;
> STORE urlList2 INTO 'output2' USING BinStorage();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message