hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jakob Homan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-2171) Allow custom serdes to set field comments
Date Wed, 25 May 2011 23:06:47 GMT

     [ https://issues.apache.org/jira/browse/HIVE-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jakob Homan updated HIVE-2171:
------------------------------

    Attachment: HIVE-2171.patch

Patch:
* Adds comment field to StructField interface and implements reasonable versions to each of
its implementations.
* Adds overloaded versions of each of the struct-based ObjectInspector factories to allow
the comments to be set.
* Adjusts MetastoreUtils to check if the comment of the field is null, if so, maintains previous
behavior, else uses the comment.
* Adds new unit test for MetastoreUtils.  For this, mockito was added as a dependency.  Right
now it looks like Hive's Ivy conf isn't set up to only include some jars in the package. 
If this patch goes in, I'll open another jira to make sure the mockito and other test-related
jars aren't included in jars they don't need to be.
* Refactors the TestStandardObjectInspectors test to test both with and without comments.

After this patch, a serde that wants to specify comments can and have them show up in the
table description. For example, with a table kst created by an implementation of SerDe, that
has an example for each type (the comments are all separate, they're all just boring: this
is field BLAH) can now set the field comments:
{noformat}hive> describe kst;
OK
string1 string  this field is string1
string2 string  this field is string2
int1    int     this field is int1
boolean1        boolean this field is boolean1
long1   bigint  this field is long1
float1  float   this field is float1
double1 double  this field is double1
inner_record1   struct<int_in_inner_record1:int,string_in_inner_record1:string> this
field is inner_record1
enum1   string  this field is enum1
array1  array<string>   this field is array1
map1    map<string,string>      this field is map1
union1  uniontype<float,boolean,string> this field is union1
fixed1  array<tinyint>  this field is fixed1
null1   void    this field is null1
unionnullint    int     this field is UnionNullInt
bytes1  array<tinyint>  this field is bytes1
ds      string
Time taken: 0.286 seconds{noformat}

One thing I noticed is that these field comments on structs should extended to substructures,
and does with this new patch for custom serdes:
{noformat}hive> describe kst.inner_record1;
OK
int_in_inner_record1    int     this field is int_in_inner_record1
string_in_inner_record1 string  this field is string_in_inner_record1
Time taken: 0.113 seconds{noformat}

However, this doesn't work correctly with built-in serdes:

{noformat}hive> create table test_table(a STRUCT<z:string COMMENT 'comment for z',x:int>
COMMENT 'comment for a');
OK
Time taken: 2.565 seconds
hive> describe test_table;
OK
a	struct<z:string,x:int>	comment for a
Time taken: 0.139 seconds
hive> describe test_table.a;
OK
z	string	from deserializer
x	int	from deserializer
Time taken: 0.096 seconds
hive> describe test_table.a.z;
OK
z	string	from deserializer
Time taken: 0.089 seconds
hive>{noformat}

The comment for field z is lost, replaced by the boilerplate text "from deserializer" and
can't be retrieved from the CLI.  I'll open a JIRA for this.

This is my first Hive patch, so please check to see if I missed anything.

> Allow custom serdes to set field comments
> -----------------------------------------
>
>                 Key: HIVE-2171
>                 URL: https://issues.apache.org/jira/browse/HIVE-2171
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>             Fix For: 0.7.1
>
>         Attachments: HIVE-2171.patch
>
>
> Currently, while serde implementations can set a field's name, they can't set its comment.
 These are set in the metastore utils to {{(from deserializer)}}.  For those serdes that can
provide meaningful comments for a field, they should be propagated to the table description.
 These serde-provided comments could be prepended to "(from deserializer)" if others feel
that's a meaningful distinction.  This change involves updating {{StructField}} to support
a (possibly null) comment field and then propagating this change out to the myriad places
{{StructField}} is thrown around.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message