pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Wagner (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-3082) outputSchema of a UDF allows two usages when describing a Tuple schema
Date Wed, 30 Mar 2016 21:06:25 GMT

     [ https://issues.apache.org/jira/browse/PIG-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Mark Wagner updated PIG-3082:
    Hadoop Flags: Incompatible change

> outputSchema of a UDF allows two usages when describing a Tuple schema
> ----------------------------------------------------------------------
>                 Key: PIG-3082
>                 URL: https://issues.apache.org/jira/browse/PIG-3082
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Julien Le Dem
>            Assignee: Jonathan Coveney
>             Fix For: 0.12.0
>         Attachments: PIG-3082-0.patch, PIG-3082-1.patch
> When defining an evalfunc that returns a Tuple there are two ways you can implement outputSchema().
> - The right way: return a schema that contains one Field that contains the type and schema
of the return type of the UDF
> - The unreliable way: return a schema that contains more than one field and it will be
understood as a tuple schema even though there is no type (which is in Field class) to specify
that. This is particularly deceitful when the output schema is derived from the input schema
and the outputted Tuple sometimes contain only one field. In such cases Pig understands the
output schema as a tuple only if there is more than one field. And sometimes it works, sometimes
it does not.
> We should at least issue a warning (backward compatibility) if not plain throw an exception
when the output schema contains more than one Field.

This message was sent by Atlassian JIRA

View raw message