hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-354) Change to default outputSchema for UDFs
Date Mon, 04 Aug 2008 22:08:44 GMT

    [ https://issues.apache.org/jira/browse/PIG-354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619722#action_12619722

Alan Gates commented on PIG-354:

I don't think we want to be converting data to chararray by default for input to UDFs, for
several reasons:

1 It's expensive
2 It mangles any data that isn't utf8
3 It is a fair amount of work for users to provide type specific implementations of their
UDFs, and so I suspect most won't.

By contrast, on the outbound side I agree that chararray is the right default, for two reasons:

1 It's very easy to determine what type the UDF is returning, either by declaring a schema
or by pig reflecting the return type.  Only in the case where they do not give a schema and
their return type is tuple or bag (thus we have no idea what inside that tuple or bag) will
we be forcing data to strings.

2 In general pig does not assume any particular representation of data in byte arrays.  That's
why we make the load function provide casts.  So if we took this unknown data from UDFs to
be byte arrays we'd have no idea how to convert it to anything else.  Conversions from strings
on the other hand are well understood.

> Change to default outputSchema for UDFs
> ---------------------------------------
>                 Key: PIG-354
>                 URL: https://issues.apache.org/jira/browse/PIG-354
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>            Priority: Critical
>             Fix For: types_branch
> Currently, if UDF writer does not specify outputSchema the default is bytearray which
is not what you would want most of the time. Making chararray a default would make things
backward compatible.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message