pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cheolsoo Park (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-4227) Streaming Python UDF handles bag outputs incorrectly
Date Wed, 15 Oct 2014 18:08:34 GMT

    [ https://issues.apache.org/jira/browse/PIG-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14172704#comment-14172704
] 

Cheolsoo Park commented on PIG-4227:
------------------------------------

Yes, you're right.

> Streaming Python UDF handles bag outputs incorrectly
> ----------------------------------------------------
>
>                 Key: PIG-4227
>                 URL: https://issues.apache.org/jira/browse/PIG-4227
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: 0.14.0
>
>         Attachments: PIG-4227-1.patch
>
>
> I have a udf that generates different outputs when running as jython and streaming python.
> {code:title=jython}
> {([[BBC Worldwide]])}
> {code} 
> {code:title=streaming python}
> {(BC Worldwid)}
> {code}
> The problem is that streaming python encodes a bag output incorrectly. For this particular
example, it serializes the output string as follows-
> {code}
> |{_[[BBC Worldwide]]|}_
> {code}
> where '|' and '\_' wrap bag delimiters '\{' and '\}'. i.e. '\{' => '|\{\_' and '\}'
=> '|\}\_'.
> But this is wrong because bag must contain tuples not chararrays. i.e. the correct encoding
is as follows-
> {code}
> |{_|(_[[BBC Worldwide]]|)_|}_
> {code}
> where '|' and '_' wrap tuple delimiters '(' and ')' as well as bag delimiters.
> This results in truncated outputs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message