flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fabian Hueske (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-5280) Extend TableSource to support nested data
Date Thu, 22 Dec 2016 09:49:58 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769629#comment-15769629

Fabian Hueske commented on FLINK-5280:

Hi [~ivan.mushketyk], 

That's an interesting idea! 

I think {{getFieldTypes()}} and {{getNumberOfFields()}} are truly redundant and might even
cause problems if they are not consistent with {{getReturnType()}}. We could make them final
but that would change the API as well, so we can also remove them. IMO, it makes sense to
break the API here. Its not declared stable and I don't think it is widely used.

The benefit of keeping {{getFieldNames()}} would be that users could still overwrite the names
of the TypeInformation by overriding the method. However, if we do that we would need to add
a {{getFieldIndicies()}} method as well to map names to positions for proper POJO support.
The question is whether it is worth to keep {{getFieldNames}} and add {{getFieldIndicies}}.
I think is make senses to have these methods. Would be aligned with the {{BatchTableEnvironment.fromDataSet()}}

We could have default implementations for {{getFieldNames()}} and {{getFieldIndicies()}} that
return {{null}} and use {{TableEnvironment.getFieldInfo(TypeInformation)}} or the explicitly
provided information if the methods are overridden. That would allow us to reuse existing
code instead of duplicating it.

What do you think [~ivan.mushketyk] and [~jark]?

> Extend TableSource to support nested data
> -----------------------------------------
>                 Key: FLINK-5280
>                 URL: https://issues.apache.org/jira/browse/FLINK-5280
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table API & SQL
>    Affects Versions: 1.2.0
>            Reporter: Fabian Hueske
>            Assignee: Ivan Mushketyk
> The {{TableSource}} interface does currently only support the definition of flat rows.

> However, there are several storage formats for nested data that should be supported such
as Avro, Json, Parquet, and Orc. The Table API and SQL can also natively handle nested rows.
> The {{TableSource}} interface and the code to register table sources in Calcite's schema
need to be extended to support nested data.

This message was sent by Atlassian JIRA

View raw message