hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chaoyu Tang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-15530) Optimize the column stats update logic in table alteration
Date Mon, 09 Jan 2017 21:19:58 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812855#comment-15812855
] 

Chaoyu Tang edited comment on HIVE-15530 at 1/9/17 9:19 PM:
------------------------------------------------------------

+1, Yes, you are right. Currently for a renamed column, its entry in related tables should
be updated as well. Ideally, I think that the renaming a column should not need re-calculate
its stats like renaming its table or database. But it can be a different issue.
For alter table to only change the column position, we might not need to update its stats,
right? I am not sure if it is a common case like "ALTER TABLE test_change CHANGE a a STRING
AFTER b;" to position column a after b.


was (Author: ctang.ma):
+1, Yes, you are right. For a renamed column, its entry in related tables should be updated
as well. 
But for alter table to only change the column position, should we update its stats? I am not
sure if it is a common case like "ALTER TABLE test_change CHANGE a a STRING AFTER b;" to position
column a after b.

> Optimize the column stats update logic in table alteration
> ----------------------------------------------------------
>
>                 Key: HIVE-15530
>                 URL: https://issues.apache.org/jira/browse/HIVE-15530
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Yibing Shi
>            Assignee: Yibing Shi
>         Attachments: HIVE-15530.1.patch, HIVE-15530.2.patch, HIVE-15530.3.patch, HIVE-15530.4.patch
>
>
> Currently when a table is altered, if any of below conditions is true, HMS would try
to update column statistics for the table:
> # database name is changed
> # table name is changed
> # old columns and new columns are not the same
> As a result, when a column is added to a table, Hive also tries to update column statistics,
which is not necessary. We can loose the last condition by checking whether all existing columns
are changed or not. If not, we don't have to update stats info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message