spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Otto (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-23890) Hive ALTER TABLE CHANGE COLUMN for struct type no longer works
Date Mon, 09 Apr 2018 19:32:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16431099#comment-16431099
] 

Andrew Otto edited comment on SPARK-23890 at 4/9/18 7:31 PM:
-------------------------------------------------------------

Hah! As a temporary workaround, we are [[instantiating a JDBC connection to Hive|https://gerrit.wikimedia.org/r/#/c/425084/2/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/refine/DataFrameToHive.scala]|http://example.com/] to
get around Spark 2's restriction...halp!  Don't make us do this!  :)

 

 


was (Author: ottomata):
Hah! As a temporary workaround, we are [instantiating a JDBC connection to Hive|http://example.com/] to
get around Spark 2's restriction...halp!  Don't make us do this!  :)

 

 

> Hive ALTER TABLE CHANGE COLUMN for struct type no longer works
> --------------------------------------------------------------
>
>                 Key: SPARK-23890
>                 URL: https://issues.apache.org/jira/browse/SPARK-23890
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Andrew Otto
>            Priority: Major
>
> As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE CHANGE COLUMN
commands to Hive.  This restriction was loosened in [https://github.com/apache/spark/pull/12714] to
allow for those commands if they only change the column comment.
> Wikimedia has been evolving Parquet backed Hive tables with data originally from JSON
events by adding newly found columns to the Hive table schema, via a Spark job we call 'Refine'. 
We do this by recursively merging an input DataFrame schema with a Hive table DataFrame schema,
finding new fields, and then issuing an ALTER TABLE statement to add the columns.  However,
because we allow for nested data types in the incoming JSON data, we make extensive use of
struct type fields.  In order to add newly detected fields in a nested data type, we must
alter the struct column and append the nested struct field.  This requires CHANGE COLUMN
that alters the column type.  In reality, the 'type' of the column is not changing, it just
just a new field being added to the struct, but to SQL, this looks like a type change.
> We were about to upgrade to Spark 2 but this new restriction in SQL DDL that can be sent
to Hive will block us.  I believe this is fixable by adding an exception in [command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325] to
allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and destination type
are both struct types, and the destination type only adds new fields.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message