spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reynold Xin (JIRA)" <>
Subject [jira] [Commented] (SPARK-15689) Data source API v2
Date Tue, 29 Aug 2017 13:48:01 GMT


Reynold Xin commented on SPARK-15689:

That seems like an issue orthogonal to the API described here. Also I don't think we should
break the old API. V2 can be added alongside v1.

The Hive schema issue is due to Spark ignoring the Hive schema, because the Hive metastore
(at least in the past) has been buggy and doesn't always accept valid data types specified
in Spark.

> Data source API v2
> ------------------
>                 Key: SPARK-15689
>                 URL:
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>            Reporter: Reynold Xin
>              Labels: SPIP, releasenotes
>         Attachments: SPIP Data Source API V2.pdf
> This ticket tracks progress in creating the v2 of data source API. This new API should
focus on:
> 1. Have a small surface so it is easy to freeze and maintain compatibility for a long
time. Ideally, this API should survive architectural rewrites and user-facing API revamps
of Spark.
> 2. Have a well-defined column batch interface for high performance. Convenience methods
should exist to convert row-oriented formats into column batches for data source developers.
> 3. Still support filter push down, similar to the existing API.
> 4. Nice-to-have: support additional common operators, including limit and sampling.
> Note that both 1 and 2 are problems that the current data source API (v1) suffers. The
current data source API has a wide surface with dependency on DataFrame/SQLContext, making
the data source API compatibility depending on the upper level API. The current data source
API is also only row oriented and has to go through an expensive external data type conversion
to internal data type.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message