From issues-return-146178-archive-asf-public=cust-asf.ponee.io@flink.apache.org Fri Jan 5 00:26:07 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id D0B4D180657 for ; Fri, 5 Jan 2018 00:26:07 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id BA994160C3C; Thu, 4 Jan 2018 23:26:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E2E61160C2B for ; Fri, 5 Jan 2018 00:26:06 +0100 (CET) Received: (qmail 94809 invoked by uid 500); 4 Jan 2018 23:26:06 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 94799 invoked by uid 99); 4 Jan 2018 23:26:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Jan 2018 23:26:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 96E4718079C for ; Thu, 4 Jan 2018 23:26:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -107.211 X-Spam-Level: X-Spam-Status: No, score=-107.211 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id tVkdNUYjjkk8 for ; Thu, 4 Jan 2018 23:26:03 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 70C4A5F295 for ; Thu, 4 Jan 2018 23:26:03 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 05513E0D22 for ; Thu, 4 Jan 2018 23:26:02 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id ED31124102 for ; Thu, 4 Jan 2018 23:26:00 +0000 (UTC) Date: Thu, 4 Jan 2018 23:26:00 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (FLINK-8203) Make schema definition of DataStream/DataSet to Table conversion more flexible MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/FLINK-8203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312184#comment-16312184 ] ASF GitHub Bot commented on FLINK-8203: --------------------------------------- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/5132#discussion_r159757312 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/api/TableEnvironment.scala --- @@ -768,6 +768,39 @@ abstract class TableEnvironment(val config: TableConfig) { frameworkConfig } + /** + * Reference input fields by name: + * All fields in the schema definition are referenced by name + * (and possibly renamed using an alias (as). In this mode, fields can be reordered and + * projected out. Moreover, we can define proctime and rowtime attributes at arbitrary + * positions using arbitrary names (except those that exist in the result schema). This mode + * can be used for any input type, including POJOs. + * + * Reference input fields by position: + * Field references must refer to existing fields in the input type (except for + * renaming with alias (as)). In this mode, fields are simply renamed. Event-time attributes can + * replace the field on their position in the input data (if it is of correct type) or be + * appended at the end. Proctime attributes must be appended at the end. This mode can only be + * used if the input type has a defined field order (tuple, case class, Row) and no of fields + * references a field of the input type. + */ + protected def isReferenceByPosition(t: TypeInformation[_], fields: Array[Expression]): Boolean = { + if (t.isInstanceOf[PojoTypeInfo[_]]) { + return false + } + + val inputNames = t match { + case ct: CompositeType[_] => ct.getFieldNames + case _ => return false // atomic types are references by name --- End diff -- If atomic types are referenced by name, what's the name? Atomic types are neither referenced by position or name. Instead we can reference the field because there is only one field in the input. Should we make this method only available for `CompositeType` by changing the type of `t` to `CompositeType`? > Make schema definition of DataStream/DataSet to Table conversion more flexible > ------------------------------------------------------------------------------ > > Key: FLINK-8203 > URL: https://issues.apache.org/jira/browse/FLINK-8203 > Project: Flink > Issue Type: Bug > Components: Table API & SQL > Affects Versions: 1.4.0, 1.5.0 > Reporter: Fabian Hueske > Assignee: Timo Walther > > When converting or registering a {{DataStream}} or {{DataSet}} as {{Table}}, the schema of the table can be defined (by default it is extracted from the {{TypeInformation}}. > The schema needs to be manually specified to select (project) fields, rename fields, or define time attributes. Right now, there are several limitations how the fields can be defined that also depend on the type of the {{DataStream}} / {{DataSet}}. Types with explicit field ordering (e.g., tuples, case classes, Row) require schema definition based on the position of fields. Pojo types which have no fixed order of fields, require to refer to fields by name. Moreover, there are several restrictions on how time attributes can be defined, e.g., event time attribute must replace an existing field or be appended and proctime attributes must be appended. > I think we can make the schema definition more flexible and provide two modes: > 1. Reference input fields by name: All fields in the schema definition are referenced by name (and possibly renamed using an alias ({{as}}). In this mode, fields can be reordered and projected out. Moreover, we can define proctime and eventtime attributes at arbitrary positions using arbitrary names (except those that existing the result schema). This mode can be used for any input type, including POJOs. This mode is used if all field references exist in the input type. > 2. Reference input fields by position: Field references might not refer to existing fields in the input type. In this mode, fields are simply renamed. Event-time attributes can replace the field on their position in the input data (if it is of correct type) or be appended at the end. Proctime attributes must be appended at the end. This mode can only be used if the input type has a defined field order (tuple, case class, Row). > We need to add more tests the check for all combinations of input types and schema definition modes. -- This message was sent by Atlassian JIRA (v6.4.14#64029)