hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-312) Casting a byte array that contains a double value to an int results in a null pointer
Date Thu, 17 Jul 2008 17:05:31 GMT

     [ https://issues.apache.org/jira/browse/PIG-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Alan Gates updated PIG-312:

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

intcast.patch checked in.

> Casting a byte array that contains a double value to an int results in a null pointer
> -------------------------------------------------------------------------------------
>                 Key: PIG-312
>                 URL: https://issues.apache.org/jira/browse/PIG-312
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>             Fix For: types_branch
>         Attachments: intcast.patch
> {code}
> a = load 'myfile' as (name, age, gpa);                                              
> c = foreach a generate age * 10, (int)gpa * 2;                                      
> store c into 'outfile';
> {code}
> The values in gpa are doubles.  The issue is that they are read as byte arrays and then
when the user tries to cast them to an int, the system does a direct cast from byte array
to int, which results in a null.  First of all, it should result in a zero, not a null (unless
the underlying value is null).  Second, we have to clarify semantics here.  gpa was never
officially declared to be a double, so trying to do a cast directly from bytearray to int
is a reasonable thing to do.  But users may not see it that way.  Do we want to first cast
numbers to double and then to anything subsequent to avoid this?  Or should we force users
to write this as (int)(double)gpa * 2 so we know to first cast to double and then int?  In
the interest of speed (especially considering the rarity of doubles in most data) I'd vote
for the latter.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message