hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arvind Prabhakar (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1287) Struct datatype should not use field names for type equivalence.
Date Mon, 29 Mar 2010 21:12:28 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851097#action_12851097
] 

Arvind Prabhakar commented on HIVE-1287:
----------------------------------------

Thanks for your comment Zheng.

I can see how the {{CAST}} would work, but believe that we need a stronger type checking semantic.
Traditionally, a {{CAST}} is used to bypass compile time checks. While this is very powerful
concept, it can lead to data corrpution if not used with caution.

An alternative to using the {{CAST}} approach would be to use compile time type checking without
regard to the field names. This is similar to function signatures in say Java - where it does
not matter what the parameter names are, as long as they are specified in the correct order.
This can be achieved by thinking of field names as aliases for the datatypes of that field.

For example - the columns defined as {{struct < a : string >}} and {{struct < b :
string >}} are type-equivalent because they are both of the type {{struct < ? : string
>}}. 


> Struct datatype should not use field names for type equivalence.
> ----------------------------------------------------------------
>
>                 Key: HIVE-1287
>                 URL: https://issues.apache.org/jira/browse/HIVE-1287
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>         Environment: Mac OS X (10.6.2) Java SE 6 ( 1.6.0_17)
>            Reporter: Arvind Prabhakar
>
> The field names for {{Struct}} types are currently being matched for testing type equivalence.
This is readily seen by running the following example:
> {noformat}
> hive> create table source ( foo struct < x : string > );
> OK
> Time taken: 3.094 seconds
> hive> load data local inpath '/path/to/sample/data.txt' overwrite into table source;
> Copying data from file:/path/to/sample/data.txt
> Loading data to table source
> OK
> Time taken: 0.593 seconds
> hive> create table sink ( bar struct < y : string >);
> OK
> Time taken: 0.11 seconds
> hive> insert overwrite table sink select foo from source;
> FAILED: Error in semantic analysis: line 1:23 Cannot insert into target table 
> because column number/types are different sink: Cannot convert column 0 
> from struct<x:string> to struct<y:string>.
> {noformat}
> Since both {{soruce.foo}} and {{sink.bar}} are similar in definition with only field
names being different, data movement between these two should be allowed. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message