pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Arthur (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2611) HBaseStorage not casting correctly
Date Fri, 23 Mar 2012 15:33:28 GMT

    [ https://issues.apache.org/jira/browse/PIG-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236680#comment-13236680

David Arthur commented on PIG-2611:

It works fine when using {{PigStorage}} without the explicit cast. I think I've tracked down
the culprit to https://github.com/apache/pig/blob/branch-0.9/src/org/apache/pig/backend/hadoop/hbase/HBaseStorage.java#L566-568

Then in {{objToBytes}}, tuple values are casted based on the type from the schema, but these
values haven't gone through "pig casting" yet (something in {{CastUtils}} maybe?).
> HBaseStorage not casting correctly
> ----------------------------------
>                 Key: PIG-2611
>                 URL: https://issues.apache.org/jira/browse/PIG-2611
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.2
>         Environment: Ubuntu 11.10, Hadoop 0.20.2, HBase 0.92.0
>            Reporter: David Arthur
>            Priority: Minor
>              Labels: cast, hbase
> When loading data into HBase with HBaseStorage, there is unexpected behavior regarding
record schema and casting.
> Here is the relevant code snippet:
> {code}
> B = group A by (time_tuple, some_scalar);
> C = foreach B {
> 	-- UDF to generate id (bytearray)
> 	generate id, flatten(group.$0), COUNT(A);
> }
> {code}
> At this point the schema for C is unknown, so I declare a schema with a foreach statement
> {code}
> D = foreach C generate $0 as id:bytearray, $1 as year:int, $2 as month:int, $3 as date:int,
$4 as count:int;
> {code}
> Even though I've declared C.$4 as an int, it is still a long (from the COUNT). When I
go to insert into HBase I get a ClassCastException since the schema (int) does not match the
actual tuple value (long). I can fix this by explicitly casting when I declare the schema.
> {code}
> D = foreach C generate $0 as id:bytearray, $1 as year:int, $2 as month:int, $3 as date:int,
(int)$4 as count:int;
> {code}
> Is this expected behavior? If not, is this an HBaseStorage issue - not honoring the schema
before going off casting things?
> Cheers,
> David

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message