spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <>
Subject [jira] [Assigned] (SPARK-23418) DataSourceV2 should not allow userSpecifiedSchema without ReadSupportWithSchema
Date Tue, 13 Feb 2018 22:12:00 GMT


Apache Spark reassigned SPARK-23418:

    Assignee:     (was: Apache Spark)

> DataSourceV2 should not allow userSpecifiedSchema without ReadSupportWithSchema
> -------------------------------------------------------------------------------
>                 Key: SPARK-23418
>                 URL:
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Ryan Blue
>            Priority: Major
> DataSourceV2 currently does not reject user-specified schemas when a source does not implement
ReadSupportWithSchema. This is confusing behavior. Here's a quote from a discussion on SPARK-23203:
> {quote}I think this will cause confusion when source schemas change. Also, I can't think
of a situation where it is a good idea to pass a schema that is ignored.
> Here's an example of how this will be confusing: think of a job that supplies a schema
identical to the table's schema and runs fine, so it goes into production. What happens when
the table's schema changes? If someone adds a column to the table, then the job will start
failing and report that the source doesn't support user-supplied schemas, even though it had
previously worked just fine with a user-supplied schema. In addition, the change to the table
is actually compatible with the old job because the new column will be removed by a projection.
> To fix this situation, it may be tempting to use the user-supplied schema as an initial
projection. But that doesn't make sense because we don't need two projection mechanisms. If
we used this as a second way to project, it would be confusing that you can't actually leave
out columns (at least for CSV) and it would be odd that using this path you can coerce types,
which should usually be done by Spark.
> I think it is best not to allow a user-supplied schema when it isn't supported by a source.
> {quote}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message