flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fabian Hueske (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-5280) Extend TableSource to support nested data
Date Mon, 19 Dec 2016 11:58:58 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15760985#comment-15760985

Fabian Hueske commented on FLINK-5280:

Sorry for my late response. I'll try to answer your questions and will comment on some of
your ideas:

* The order of fields in a {{PojoTypeInfo}} depend on the order in which the fields are returned
by Java' reflection interfaces. I think the order is lexicographic. But as [~jark] said, we
can get the indexes of the fields by {{PojoTypeInfo.getFieldIndex()}}. This will give us the
right indexes. 
* We do not need (and don't want to have) an Avro dependency in {{flink-table}}. The mapping
should happen in the {{TableSource}} which should be located in a connector Maven module.
* A Generic Avro record is a generic holder for data of any schema. The data of a generic
record object is interpreted using an Avro Schema. The Schema would give us the required field
names and types. Using the schema, we could construct a {{Row}} with possibly nested {{Row}}s
and move all data from a generic record into a nested {{Row}} object. 

* {{TableSource.getNumberOfFields()}} can be dropped. The question is whether this is important
enough to break the API. If we decide to touch the interface, I'm +1 to remove it.
* I'm not sure about requiring that a {{TableSource}} must return a {{Row}}. In case of a
Specific Avro record, we would need an additional step to copy the first-level Pojo fields
into a {{Row}}, which would need some reflection or code generation, instead of simply forwarding
the Avro object. We could still allow any kind of type information and use field names provided
by the {{TypeInformation}}. If the return type is a Pojo, we would use its field names. If
the return type is a Tuple, the fields would be named `f0`, `f1`, .... If this is not desired,
the {{TableSource}} could return {{Row}}s. If we want to rename fields, we have to use {{Row}}
as well.

> Extend TableSource to support nested data
> -----------------------------------------
>                 Key: FLINK-5280
>                 URL: https://issues.apache.org/jira/browse/FLINK-5280
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table API & SQL
>    Affects Versions: 1.2.0
>            Reporter: Fabian Hueske
>            Assignee: Ivan Mushketyk
> The {{TableSource}} interface does currently only support the definition of flat rows.

> However, there are several storage formats for nested data that should be supported such
as Avro, Json, Parquet, and Orc. The Table API and SQL can also natively handle nested rows.
> The {{TableSource}} interface and the code to register table sources in Calcite's schema
need to be extended to support nested data.

This message was sent by Atlassian JIRA

View raw message