flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: CSV input with unknown # of fields and Custom output format
Date Wed, 04 Feb 2015 10:35:48 GMT

I would go with the TypeSerializerInputFormat.

Here is a code sample (in Java, Scala should work the same way):


DataSet<MyType> dataSet = ...;

// write it out
dataSet.write(new TypeSerializerOutputFormat<MyType>(), "path");

// read it in
DataSet<MyType> read = env.readFile(new
TypeSerializerInputFormat<MyType>(dataSet.getType()), "path");


The important thing to notice is that the TypeSerializerInputFormat needs
the TypeInformation of the type to read (to know how to deserialize it).

The simplest way of making sure that the type information used for reading
is the same as that used for writing, is by simply taking the type
from the written data set (by calling getType() on the data set).


On Wed, Feb 4, 2015 at 11:18 AM, Vinh June <hoangthevinh.htv@gmail.com>

> Hello Fabian and Stephan,
> Thank you guys for your reply
> @Fabian: Could you please be kind enough to write a dummy example using
> SerializedOutputFormat and SerializedInputFormat, I tried with below
> instruction:
> dataset.write(new SerializedOutputFormat[MyClass], dataPath)
> but it doesn't work and throws error because type arguments [...] do not
> conform to class SerializedOutputFormat's type parameter bounds.
> I tried with Stephan's advice to use TypeSerielizerOutputFormat, it seems
> to
> work but I do not know how to read the output back in
> @Stephan: output using dataset.write(new
> TypeSerializerOutputFormat[MyClass], outputDataPath) works for me. but when
> I tried the same instruction to read back:
> val readback = env.readFile[MyClass](new
> TypeSerializerInputFormat[MyClass],
> dataPath)
> this gives error of unspecified value parameter x$1
> when I tried to add TypeSerializer as IntelliJ suggests:
> val readback = env.readFile[MyClass](new
> TypeSerializerInputFormat[MyClass](TypeSerializer[MyClass]), dataPath)
> it says that TypeSerializer is not a value.
> I couldn't figure out any solution from the debug log. Is there any working
> example for using those in Scala ??
> --
> View this message in context:
> http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/CSV-input-with-unknown-of-fields-and-Custom-output-format-tp670p673.html
> Sent from the Apache Flink (Incubator) User Mailing List archive. mailing
> list archive at Nabble.com.

View raw message