spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Armbrust (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-6242) Support replace (drop) column for parquet table
Date Sat, 21 Mar 2015 21:56:38 GMT

     [ https://issues.apache.org/jira/browse/SPARK-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael Armbrust updated SPARK-6242:
------------------------------------
    Target Version/s: 1.4.0

> Support replace (drop) column for parquet table
> -----------------------------------------------
>
>                 Key: SPARK-6242
>                 URL: https://issues.apache.org/jira/browse/SPARK-6242
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.3.0
>            Reporter: chirag aggarwal
>            Assignee: Cheng Lian
>
> SPARK-5528 provides a easy way of support for add column to parquet tables. This is done
by using the native parquet capability of merging the schema from all the part-files and _common_metadata
files.
> But, if someone wants to drop a column from the parquet table, this still does not work.
This happens because, the merged schema shall still show the dropped column, but the column
is no more there in metastore. So, the schema's obtained from the two sources do not match,
and hence any subsequent query on this table fails.
> Instead of checking for exact match between the two schemas, spark should only check
if the schema obtained from metastore is subset of parquet merged schema or not. If this check
passes, then the columns present in metastore should be allowed to be referred in the query.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message