hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vihang Karajgaonkar (JIRA)" <>
Subject [jira] [Commented] (HIVE-17714) move custom SerDe schema considerations into metastore from QL
Date Mon, 13 Nov 2017 20:03:00 GMT


Vihang Karajgaonkar commented on HIVE-17714:

Thanks [~alangates] for the response. I have some questions regarding your suggestions:

bq. [and I suspect TypeInfo and ObjectInspector will have to come too] to a new module in
storage-api. This avoids the need for ORC and any other storage format to pick it up. 
I will try bringing in TypeInfo and ObjectInspector too. What are the specific advantages
of doing that? Also, I didn't quite understand by "avoids the need for ORC and any other storage
format to pick it up". Can you please elaborate?

bq. This will result in a single module that the metastore (and anyone else who wants to use
Hive serdes) can use without having to pick up all of Hive.
This assumes that SerDes implementations do not bring along other dependencies like hive-common
etc. I am not sure yet but I think it is very likely that these SerDes will have more dependencies,
so it may not be just adding hive-serde.jar to the standalone-metastore classpath. I already
see hive-serde depends on hive-common, hive-service-rpc and hive-shims so not sure if we will
be able to create a standalone serde jar for metastore.

> move custom SerDe schema considerations into metastore from QL
> --------------------------------------------------------------
>                 Key: HIVE-17714
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Alan Gates
> Columns in metastore for tables that use external schema don't have the type information
(since HIVE-11985) and may be entirely inconsistent (since forever, due to issues like HIVE-17713;
or for SerDes that allow an URL for the schema, due to a change in the underlying file).
> Currently, if you trace the usage of ConfVars.SERDESUSINGMETASTOREFORSCHEMA, and to MetaStoreUtils.getFieldsFromDeserializer,
you'd see that the code in QL handles this in Hive. So, for the most part metastore just returns
whatever is stored for columns in the database.
> One exception appears to be get_fields_with_environment_context, which is interesting...
so getTable will return incorrect columns (potentially), but get_fields/get_schema will return
correct ones from SerDe as far as I can tell.
> As part of separating the metastore, we should make sure all the APIs return the correct
schema for the columns; it's not a good idea to have everyone reimplement getFieldsFromDeserializer.
> Note: this should also remove a flag introduced in HIVE-17731

This message was sent by Atlassian JIRA

View raw message