hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vihang Karajgaonkar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-15995) Syncing metastore table with serde schema
Date Mon, 26 Mar 2018 21:56:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414623#comment-16414623
] 

Vihang Karajgaonkar commented on HIVE-15995:
--------------------------------------------

HI [~szita] Thanks for making the changes. Is the q.out file correct? Eg: I see the following
sequence of statements in the qfile

{noformat}
--case: partial partition spec
86	ALTER TABLE avro_extschema_url_parted SET
87	 TBLPROPERTIES ('avro.schema.url'='${system:test.tmp.dir}/grad2.avsc');
88	ALTER TABLE avro_extschema_url_parted PARTITION (p1=2018) UPDATE COLUMNS;
89	ALTER TABLE avro_extschema_url_parted UNSET TBLPROPERTIES ('avro.schema.url');
90	
91	DESCRIBE avro_extschema_url_parted;
92	DESCRIBE avro_extschema_url_parted PARTITION (p1=2017, p2=11);
93	DESCRIBE avro_extschema_url_parted PARTITION (p1=2018, p2=2);
94	DESCRIBE avro_extschema_url_parted PARTITION (p1=2018, p2=3);
{noformat}


Shouldn't the describe command return schema based on grad2.avsc for (p1=2018, p2=2) and (p1=2018,
p2=3) case? Am I misunderstanding something? Sorry for the back and forth but just wanted
to confirm if the q.out is the expected behavior

> Syncing metastore table with serde schema
> -----------------------------------------
>
>                 Key: HIVE-15995
>                 URL: https://issues.apache.org/jira/browse/HIVE-15995
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 1.2.1, 2.1.0, 3.0.0
>            Reporter: Michal Ferlinski
>            Assignee: Adam Szita
>            Priority: Major
>         Attachments: HIVE-15995.1.patch, HIVE-15995.2.patch, HIVE-15995.3.patch, HIVE-15995.4.patch,
HIVE-15995.5.patch, HIVE-15995.patch, cx1.avsc, cx2.avsc
>
>
> Hive enables table schema evolution via properties. For avro e.g. we can alter the 'avro.schema.url'
property to update table schema to the next version. Updating properties however doesn't affect
column list stored in metastore DB so the table is not in the newest version when returned
from metastore API. This is problem for tools working with metastore (e.g. Presto).
> To solve this issue I suggest to introduce new DDL statement syncing metastore columns
with those from serde:
> {code}
> ALTER TABLE user_test1 UPDATE COLUMNS
> {code}
> Note that this is format independent solution. 
> To reproduce, follow the instructions below:
> - Create table based on avro schema version 1 (cxv1.avsc)
> {code}
> CREATE EXTERNAL TABLE user_test1
>   PARTITIONED BY (dt string)
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION
>   '/tmp/schema-evolution/user_test1'
>   TBLPROPERTIES ('avro.schema.url'='/tmp/schema-evolution/cx1.avsc');
> {code}
> - Update schema to version 2 (cx2.avsc)
> {code}
> ALTER TABLE user_test1 SET TBLPROPERTIES ('avro.schema.url' = '/tmp/schema-evolution/cx2.avsc');
> {code}
> - Print serde columns (top info) and metastore columns (Detailed Table Information):
> {code}
> DESCRIBE EXTENDED user_test1
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message