hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Travis Crawford (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-2941) Hive should expand nested structs when setting the table schema from thrift structs
Date Thu, 12 Apr 2012 17:45:20 GMT

    [ https://issues.apache.org/jira/browse/HIVE-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252649#comment-13252649
] 

Travis Crawford commented on HIVE-2941:
---------------------------------------

Here are some additional details about the issue. Consider the following create table statement.
Columns will be discovered for the table by reflecting on the {{Person}} object (instead of
explicitly specifying them).

{code}
hive> create external table travis_test.person_test 
    >   partitioned by (part_dt string)
    >   row format serde "com.twitter.elephantbird.hive.serde.ThriftSerDe"
    >     with serdeproperties ("serialization.class"="com.twitter.elephantbird.examples.thrift.Person")
    >   stored as
    >     inputformat "com.twitter.elephantbird.mapred.input.HiveMultiInputFormat"
    >     outputformat "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
{code}

Current behavior does not expand nested structures, listing the class name of nested structs
as the field type. Users browsing the schema do not get a full definition of the table schema.

{code}
hive> describe extended person_test;                                                  
                 
OK
name	com.twitter.elephantbird.examples.thrift.Name	from deserializer
id	int	from deserializer
email	string	from deserializer
phones	array<com.twitter.elephantbird.examples.thrift.PhoneNumber>	from deserializer
part_dt	string	
{code}

This patch expands nested structures, showing the full table schema. Here's an example of
what the table looks like with the patch:

{code}
hive> describe extended person_test;
OK
name	struct<first_name:string,last_name:string>	from deserializer
id	int	from deserializer
email	string	from deserializer
phones	array<struct<number:string,type:struct<value:int>>>	from deserializer
part_dt	string	
{code}

In both cases, the table storage descriptor is unchanged - both list the columns as {{cols:[]}}.

I believe the reflected table schema should be copied into the partition storage descriptor
when adding a new partition, but that could be a separate change.
                
> Hive should expand nested structs when setting the table schema from thrift structs
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-2941
>                 URL: https://issues.apache.org/jira/browse/HIVE-2941
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Travis Crawford
>            Assignee: Travis Crawford
>         Attachments: HIVE-2941.D2721.1.patch
>
>
> When setting a table serde, the deserializer is queried for its schema, which is used
to set the metastore table schema. The current implementation uses the class name stored in
the field as the field type.
> By storing the class name as the field type, users cannot see the contents of a struct
with "describe tblname". Applications that query HiveMetaStore for the table schema (specifically
HCatalog in this case) see an unknown field type, rather than a struct containing known field
types.
> Hive should store the expanded schema in the metastore so users browsing the schema see
expanded fields, and applications querying metastore see familiar types.
> DETAILS
> Set the table serde to something like this. This serde uses the built-in {{ThriftStructObjectInspector}}.
> {code}
> alter table foo_test
>   set serde "com.twitter.elephantbird.hive.serde.ThriftSerDe"
>   with serdeproperties ("serialization.class"="com.foo.Foo");
> {code}
> This causes a call to {{MetaStoreUtils.getFieldsFromDeserializer}} which returns a list
of fields and their schemas. However, currently it does not handle nested structs, and if
{{com.foo.Foo}} above contains a field {{com.foo.Bar}}, the class name {{com.foo.Bar}} would
appear as the field type. Instead, nested structs should be expanded.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message