flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fabian Hueske (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-1919) Add HCatOutputFormat for Tuple data types
Date Tue, 21 Apr 2015 13:40:59 GMT
Fabian Hueske created FLINK-1919:

             Summary: Add HCatOutputFormat for Tuple data types
                 Key: FLINK-1919
                 URL: https://issues.apache.org/jira/browse/FLINK-1919
             Project: Flink
          Issue Type: New Feature
          Components: Java API, Scala API
            Reporter: Fabian Hueske
            Priority: Minor

It would be good to have an OutputFormat that can write data to HCatalog tables.

The Hadoop `HCatOutputFormat` expects `HCatRecord` objects and writes these to HCatalog tables.
We can do the same thing, by creating these `HCatRecord` object with a Map function that precedes
a `HadoopOutputFormat` that wraps the Hadoop `HCatOutputFormat`.

Better support for Flink Tuples can be added by implementing a custom `HCatOutputFormat` that
also depends on the Hadoop `HCatOutputFormat` but internally converts Flink Tuples to `HCatRecords`.
This would also include to check if the schema of the HCatalog table and the Flink tuples
match. For data types other than tuples, the OutputFormat could either require a preceding
Map function that converts to `HCatRecords` or let users specify a MapFunction and invoke
that internally.

We have already a Flink `HCatInputFormat` which does this in the reverse directions, i.e.,
it emits Flink Tuples from HCatalog tables.

This message was sent by Atlassian JIRA

View raw message