hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vihangk1 <>
Subject [GitHub] hive pull request #310: HIVE-17580 : Remove dependency of get_fields_with_en...
Date Thu, 22 Feb 2018 23:43:15 GMT
GitHub user vihangk1 opened a pull request:

    HIVE-17580 : Remove dependency of get_fields_with_environment_context API to serde

    This version of patch moves TypeInfo and its sub-classes to standalone-metastore. The
motivation of doing this is that metastore needs the TypeInfo like classes to store the metadata
about types. This is implemented by TypeInfos in Hive. Metastore needs this information because
table like avro can define schema externally using url to a file containing schema or a string
value of the schema added as a table property. In such cases metastore need to parse this
information and convert them into FieldSchema. Before this patch this String->FieldSchema
conversion was done using SerDes using the ObjectInspectors and the typeInfos from them. This
patch bypasses a lot of that to remove the dependency to the SerDes such that it converts
the String -> TypeInfo -> FieldSchema.
    In order to achieve this and also for reducing duplicate code and a cleaner design, this
patch moves TypeInfo and its subclasses (ListTypeInfo, MapTypeInfo, StructTypeInfo, UnionTypeInfo),
TypeInfoParser to standalone metastore. In case of PrimitiveTypeInfo, Hive code has added
lot more than just type metadata in PrimitiveTypeInfo. Specifically, PrimitiveTypeEntry, PrimitiveCategory
is type implementation detail which cannot be moved to standalone-metastore. Not to mention
bring in PrimitiveTypeEntry bring in a whole lot of dependent code with it. To workaround
this issue, a new class called MetastorePrimitiveTypeInfo is introduced in standalone-metastore.
This class contains only the information which is needed by metastore from PrimitiveTypeInfo
and PrimitiveTypeInfo extends MetastorePrimitiveTypeInfo. This way we reduce the scope of
changes greatly. PrimitiveTypeInfo now contains implementation details of Hive's primitive
types. Moving TypeInfo to standalone-metastore also nee
 ds the Category enum which unfortunately was defined in ObjectInspector. This is no way around
this and this patch had to move Category to TypeInfo from ObjectInspector. Most of the file
changes are due to this move.
    Moving TypeInfoFactory was also very disruptive and hence an interface called ITypeInfoFactory
is created in metastore and both metastore and hive implement this interface. The Avro storage
schema reader now can use the TypeInfoToSchema and SchemaToTypeInfo util classes (also moved
to metastore) using the ITypeInfoFactory interface.

You can merge this pull request into a Git repository by running:

    $ git pull vihangk1_HIVE-17580v4

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #310
commit 2bdf6e18132f99f8998eceed8af0b77865fd85d4
Author: Vihang Karajgaonkar <vihang@...>
Date:   2018-02-22T21:10:03Z

    Moved TypeInfo to standalone-metastore

commit 756a394280d0a940b7dbcca05805a62978c4d8b2
Author: Vihang Karajgaonkar <vihang@...>
Date:   2018-02-22T22:35:50Z

    Introduce Avro storage schema reader



View raw message