flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinh June <hoangthevinh....@gmail.com>
Subject CSV input with unknown # of fields and Custom output format
Date Tue, 03 Feb 2015 21:31:49 GMT
Hi Flinkers,
I am totally new to Flink and Scala. I am trying to study Flink in Scala for
a project in university and ran into 2 problems, it would be great if you
guys can give me any advice.

#1 problem is that I want to read CSV files with varied fields (different
names and number of fields), for example:
file 1: id, name, age
file 2: id, name, [unknown1], [unknown2]
expected result set: id, name, age, [unknown1], [unknown2]

Currently I read each file as Array, then map the array to a common class
with Map[header, value] (since I will need to know which value belongs to
which header)
With this method I ran into #2 problem with output format

#2 I would like to store binary info, for example, for class
DataSet[MyClass[id: Long, array: Array[String]]] to read them later. I found
FileOutputFormat might be the solution, but I can't find any example of how
to define one in Scala

View this message in context: http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/CSV-input-with-unknown-of-fields-and-Custom-output-format-tp670.html
Sent from the Apache Flink (Incubator) User Mailing List archive. mailing list archive at

View raw message