flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tzu-Li (Gordon) Tai" <tzuli...@apache.org>
Subject Re: Thrift object serialization
Date Tue, 16 May 2017 05:32:36 GMT
Hi Flavio!

I believe [1] has what you are looking for. Have you taken a look at that?


[1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/custom_serializers.html

On 15 May 2017 at 9:08:33 PM, Flavio Pompermaier (pompermaier@okkam.it) wrote:

Hi to all,
in my Flink job I create a Dataset<MyThriftObj> using HadoopInputFormat in this way:

HadoopInputFormat<Void, MyThriftObj> inputFormat = new HadoopInputFormat<>(
        new ParquetThriftInputFormat<MyThriftObj>(), Void.class, MyThriftObj.class,
FileInputFormat.addInputPath(job,  new org.apache.hadoop.fs.Path(inputPath);
DataSet<Tuple2<Void, MyThriftObj>> ds = env.createInput(inputFormat);

Flink logs this message:
TypeExtractor - class MyThriftObj contains custom serialization methods we do not call.

Indeed MyThriftObj has readObject/writeObject functions and when I print the type of ds I
Java Tuple2<Void, GenericType<MyThriftObj>>
Fom my experience GenericType is a performace killer...what should I do to improve the reading/writing
of MyThriftObj?


Flavio Pompermaier
Development Department

OKKAM S.r.l.
Tel. +(39) 0461 1823908
View raw message