From reviews-return-956101-archive-asf-public=cust-asf.ponee.io@spark.apache.org Fri Nov 1 22:51:54 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 0C91C180626 for ; Fri, 1 Nov 2019 23:51:53 +0100 (CET) Received: (qmail 23872 invoked by uid 500); 1 Nov 2019 22:51:53 -0000 Mailing-List: contact reviews-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@spark.apache.org Received: (qmail 23861 invoked by uid 99); 1 Nov 2019 22:51:53 -0000 Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Nov 2019 22:51:53 +0000 From: GitBox To: reviews@spark.apache.org Subject: [GitHub] [spark] rdblue commented on a change in pull request #26297: [SPARK-29665][SQL] refine the TableProvider interface Message-ID: <157264871334.31354.9560360338197070835.gitbox@gitbox.apache.org> Date: Fri, 01 Nov 2019 22:51:53 -0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit rdblue commented on a change in pull request #26297: [SPARK-29665][SQL] refine the TableProvider interface URL: https://github.com/apache/spark/pull/26297#discussion_r341776246 ########## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableProvider.java ########## @@ -36,26 +39,34 @@ public interface TableProvider { /** - * Return a {@link Table} instance to do read/write with user-specified options. + * Infer the schema of the table that is identified by the given options. + * + * @param options The options that can identify a table, e.g. file path, Kafka topic name, etc. + * It's an immutable case-insensitive string-to-string map. * - * @param options the user-specified options that can identify a table, e.g. file path, Kafka - * topic name, etc. It's an immutable case-insensitive string-to-string map. */ - Table getTable(CaseInsensitiveStringMap options); + StructType inferSchema(CaseInsensitiveStringMap options); /** - * Return a {@link Table} instance to do read/write with user-specified schema and options. - *

- * By default this method throws {@link UnsupportedOperationException}, implementations should - * override this method to handle user-specified schema. - *

- * @param options the user-specified options that can identify a table, e.g. file path, Kafka - * topic name, etc. It's an immutable case-insensitive string-to-string map. - * @param schema the user-specified schema. - * @throws UnsupportedOperationException + * Infer the partitioning of the table that is identified by the given options. + * + * @param schema The schema of the table. + * @param options The options that can identify a table, e.g. file path, Kafka topic name, etc. + * It's an immutable case-insensitive string-to-string map. + */ + Transform[] inferPartitioning(StructType schema, CaseInsensitiveStringMap options); Review comment: I don't think the schema is actually needed. Partitioning and schemas are mostly orthogonal. If anything, you could argue that identity partitions should be in the schema and that `inferSchema` could accept the result of `inferPartitioning`. Also, none of the implementations actually use it besides the one that uses it to create a file index. It seems to me like this is more of a convenience for that implementation than something that is generally needed. Can we remove it from the API? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org For additional commands, e-mail: reviews-help@spark.apache.org