flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1271) Extend HadoopOutputFormat and HadoopInputFormat to handle Void.class
Date Wed, 07 Jan 2015 08:49:37 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267423#comment-14267423
] 

ASF GitHub Bot commented on FLINK-1271:
---------------------------------------

GitHub user FelixNeutatz opened a pull request:

    https://github.com/apache/flink/pull/287

    [FLINK-1271] Remove writable limitation

    This pull request will remove the limitation of the Hadoop Format to use writables. This
makes it possible to use Parquet.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/FelixNeutatz/incubator-flink RemoveWritableLimitation

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/287.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #287
    
----
commit b5f399d933f0d0697c7b17752277ad4f751eb2c2
Author: FelixNeutatz <neutatz@googlemail.com>
Date:   2015-01-06T19:47:00Z

    [FLINK-1271] Remove Writable limitation from Hadoop Format

commit 43be886042cb145b0a4677e7e5528ea7eb1fedb0
Author: FelixNeutatz <neutatz@googlemail.com>
Date:   2015-01-06T19:58:45Z

    [FLINK-1271] clean format

commit 44da293b6a872e2a97665088af6b7088ef4befe6
Author: FelixNeutatz <neutatz@googlemail.com>
Date:   2015-01-06T20:09:52Z

    [FLINK-1271] clean format 2

commit 8001adb5272e3e2d866025445a4aa32fa82c6329
Author: FelixNeutatz <neutatz@googlemail.com>
Date:   2015-01-06T20:16:48Z

    [FLINK-1271] clean3

commit 6f634c6950901f63288113786e9f4afb4c32ca97
Author: FelixNeutatz <neutatz@googlemail.com>
Date:   2015-01-06T20:22:53Z

    [FLINK-1271] clean 4

commit e9d3b7bd6e578aafe14935fe0d3aa7daa8a4d311
Author: FelixNeutatz <neutatz@googlemail.com>
Date:   2015-01-06T20:27:43Z

    [FLINK-1271] clean 5

commit 0670c4cc967700cd7ba685e3e2085950bca26aa8
Author: FelixNeutatz <neutatz@googlemail.com>
Date:   2015-01-06T20:31:56Z

    [FLINK-1271] clean 5

commit fefb880f496043d7fc9cac896210740dfa18f57e
Author: FelixNeutatz <neutatz@googlemail.com>
Date:   2015-01-06T20:36:17Z

    [FLINK-1271] clean 7

commit 0eb74d2929dd4c5659a9f05064d32fbeb9bec5c8
Author: FelixNeutatz <neutatz@googlemail.com>
Date:   2015-01-06T20:54:04Z

    [FLINK-1271] clean up +1

commit b50324c751a52b31d8533a2d2116191715f504b0
Author: FelixNeutatz <neutatz@googlemail.com>
Date:   2015-01-06T20:58:16Z

    [FLINK-1271] clean up

----


> Extend HadoopOutputFormat and HadoopInputFormat to handle Void.class 
> ---------------------------------------------------------------------
>
>                 Key: FLINK-1271
>                 URL: https://issues.apache.org/jira/browse/FLINK-1271
>             Project: Flink
>          Issue Type: Wish
>          Components: Hadoop Compatibility
>            Reporter: Felix Neutatz
>            Assignee: Felix Neutatz
>            Priority: Minor
>              Labels: Columnstore, HadoopInputFormat, HadoopOutputFormat, Parquet
>             Fix For: 0.8
>
>
> Parquet, one of the most famous and efficient column store formats in Hadoop uses Void.class
as Key!
> At the moment there are only keys allowed which extend Writable.
> For example, we would need to be able to do something like:
> HadoopInputFormat hadoopInputFormat = new HadoopInputFormat(new ParquetThriftInputFormat(),
Void.class, AminoAcid.class, job);
> ParquetThriftInputFormat.addInputPath(job, new Path("newpath"));
> ParquetThriftInputFormat.setReadSupportClass(job, AminoAcid.class);
> // Create a Flink job with it
> DataSet<Tuple2<Void, AminoAcid>> data = env.createInput(hadoopInputFormat);
> Where AminoAcid is a generated Thrift class in this case.
> However, I figured out how to output Parquet files with Parquet by creating a class which
extends HadoopOutputFormat.
> Now we will have to discuss, what's the best approach to make the Parquet integration
happen



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message