flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1271) Extend HadoopOutputFormat and HadoopInputFormat to handle Void.class
Date Sun, 19 Apr 2015 10:30:58 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14501785#comment-14501785

ASF GitHub Bot commented on FLINK-1271:

Github user MohamedNadjibMAMI commented on the pull request:

    "If you want you can write a blog post about using Parquet with Flink." This would a great
plus for the project. Hope it goes a bit deeper and enters the official documentation.

> Extend HadoopOutputFormat and HadoopInputFormat to handle Void.class 
> ---------------------------------------------------------------------
>                 Key: FLINK-1271
>                 URL: https://issues.apache.org/jira/browse/FLINK-1271
>             Project: Flink
>          Issue Type: Wish
>          Components: Hadoop Compatibility
>            Reporter: Felix Neutatz
>            Assignee: Felix Neutatz
>            Priority: Minor
>              Labels: Columnstore, HadoopInputFormat, HadoopOutputFormat, Parquet
>             Fix For: 0.9, 0.8.1
> Parquet, one of the most famous and efficient column store formats in Hadoop uses Void.class
as Key!
> At the moment there are only keys allowed which extend Writable.
> For example, we would need to be able to do something like:
> HadoopInputFormat hadoopInputFormat = new HadoopInputFormat(new ParquetThriftInputFormat(),
Void.class, AminoAcid.class, job);
> ParquetThriftInputFormat.addInputPath(job, new Path("newpath"));
> ParquetThriftInputFormat.setReadSupportClass(job, AminoAcid.class);
> // Create a Flink job with it
> DataSet<Tuple2<Void, AminoAcid>> data = env.createInput(hadoopInputFormat);
> Where AminoAcid is a generated Thrift class in this case.
> However, I figured out how to output Parquet files with Parquet by creating a class which
extends HadoopOutputFormat.
> Now we will have to discuss, what's the best approach to make the Parquet integration

This message was sent by Atlassian JIRA

View raw message