Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id EDADB200BE3 for ; Thu, 22 Dec 2016 10:49:59 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id EC4A3160B27; Thu, 22 Dec 2016 09:49:59 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 4AAFA160B1F for ; Thu, 22 Dec 2016 10:49:59 +0100 (CET) Received: (qmail 89466 invoked by uid 500); 22 Dec 2016 09:49:58 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 89438 invoked by uid 99); 22 Dec 2016 09:49:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Dec 2016 09:49:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 5C4892C0086 for ; Thu, 22 Dec 2016 09:49:58 +0000 (UTC) Date: Thu, 22 Dec 2016 09:49:58 +0000 (UTC) From: "Fabian Hueske (JIRA)" To: issues@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (FLINK-5280) Extend TableSource to support nested data MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 22 Dec 2016 09:50:00 -0000 [ https://issues.apache.org/jira/browse/FLINK-5280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769629#comment-15769629 ] Fabian Hueske commented on FLINK-5280: -------------------------------------- Hi [~ivan.mushketyk], That's an interesting idea! I think {{getFieldTypes()}} and {{getNumberOfFields()}} are truly redundant and might even cause problems if they are not consistent with {{getReturnType()}}. We could make them final but that would change the API as well, so we can also remove them. IMO, it makes sense to break the API here. Its not declared stable and I don't think it is widely used. The benefit of keeping {{getFieldNames()}} would be that users could still overwrite the names of the TypeInformation by overriding the method. However, if we do that we would need to add a {{getFieldIndicies()}} method as well to map names to positions for proper POJO support. The question is whether it is worth to keep {{getFieldNames}} and add {{getFieldIndicies}}. I think is make senses to have these methods. Would be aligned with the {{BatchTableEnvironment.fromDataSet()}} methods. We could have default implementations for {{getFieldNames()}} and {{getFieldIndicies()}} that return {{null}} and use {{TableEnvironment.getFieldInfo(TypeInformation)}} or the explicitly provided information if the methods are overridden. That would allow us to reuse existing code instead of duplicating it. What do you think [~ivan.mushketyk] and [~jark]? > Extend TableSource to support nested data > ----------------------------------------- > > Key: FLINK-5280 > URL: https://issues.apache.org/jira/browse/FLINK-5280 > Project: Flink > Issue Type: Improvement > Components: Table API & SQL > Affects Versions: 1.2.0 > Reporter: Fabian Hueske > Assignee: Ivan Mushketyk > > The {{TableSource}} interface does currently only support the definition of flat rows. > However, there are several storage formats for nested data that should be supported such as Avro, Json, Parquet, and Orc. The Table API and SQL can also natively handle nested rows. > The {{TableSource}} interface and the code to register table sources in Calcite's schema need to be extended to support nested data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)