flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <rmetz...@apache.org>
Subject Re: Strategies for reading structured file formats as POJO DataSets
Date Thu, 05 Mar 2015 09:58:04 GMT
Hi Elliot,

Right now there is no tooling support for reading CSV/TSV data into a POJO,
but there is a pull request open where a user contributes such a feature:
https://github.com/apache/flink/pull/426
So its probably only a matter of days until it is available in master.

Your suggested approach of using a mapper is perfectly fine.
You can do it a bit easier by using env.readCsvFile(). It will do the
parsing into the types for you.

Sorry that the feature is not already available for you.

Please let us know if you have more questions regarding Flink.


Best,
Robert


On Thu, Mar 5, 2015 at 10:18 AM, Elliot West <teabot@gmail.com> wrote:

> Hello,
>
> As a new Flink user I wondered if there are any existing approaches or
> practices for reading file formats such as CSV, TSV, etc. as DataSets or
> POJOs? My current approach can be illustrated with a contrived example:
>
> // Simulating a TSV file DataSet
>
> DataSet<String> tsvRatings = env.fromElements("category-1\t10");
>
> // Mapping to a POJO
>
> DataSet<Rating> ratings = tsvRatings.map(line -> {
>   String[] elements = line.split("\t");
>   return new Rating(elements[0], Integer.parseInt(elements[1]));     });
>
>
> While such a mapping could be implemented in a more general form, I'm keen
> to avoid wheel reinvention and therefore wonder if there are already good
> ways of doing this?
>
> Thanks - Elliot.
>
>

Mime
View raw message