pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Dillon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1718) Cannot directly cast output of UDF
Date Mon, 15 Nov 2010 22:36:14 GMT

    [ https://issues.apache.org/jira/browse/PIG-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932238#action_12932238
] 

Mike Dillon commented on PIG-1718:
----------------------------------

Thanks for the update Santhosh. Is the semantics cleanup targeted for a particular release
or milestone? If so, it would be great if this JIRA issue could either be included in that
milestone, marked as depending on an upstream issue, or closed as a duplicate.

> Cannot directly cast output of UDF
> ----------------------------------
>
>                 Key: PIG-1718
>                 URL: https://issues.apache.org/jira/browse/PIG-1718
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.7.0
>         Environment: Macbook Pro 6.2, Ubuntu 10.04 AMD64, CDH3 beta 3
>            Reporter: Mike Dillon
>            Priority: Minor
>
> I'm in the process of writing a suite of UDFs to deal with nested JSON data inside of
Pig. In one case, I created a UDF of type EvalFunc<String> and wanted to use it like
so:
> {code}
> RAW = load 'input.tsv' using PigStorage as ( id: int, json: chararray );
> IN = foreach RAW generate id, ExtractString(json, 'count') as count:int
> {code}
> When I do this, I get the following error:
> {quote}
> ERROR 1022: Type mismatch merging schema prefix. Field Schema: chararray. Other Field
Schema: count: int
> {quote}
> I can work around it by adding another projection with just a cast (as below), but I'd
prefer if the form I just first just worked.
> {code}
> RAW = load 'input.tsv' using PigStorage as ( id: int, json: chararray );
> MID = foreach RAW generate id, ExtractString(json, 'count') as count
> IN = foreach MID generate id, (int)count
> {code}
> I'd prefer not to have to have ExtractInteger extends EvalFun<Integer> if I can
avoid it. In our case, it gets even more cumbersome because we want to have something like
ExtractStringTuple extends EvalFunc<Tuple> that returns a tuple of strings without parsing
the JSON over and over again:
> {code}
> RAW = load 'input.tsv' using PigStorage as ( id: int, json: chararray );
> IN = foreach RAW generate id, ExtractStringTuple(json, 'name', 'count', 'mean') as (name,
count:int, mean:double);
> {code}
> As indicated, I have tested this with Pig 0.7.0. My apologies if this is already fixed
in 0.8 since I was not able to test with a newer version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message