flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Pompermaier <pomperma...@okkam.it>
Subject Re: POJO Dataset read and write
Date Fri, 27 Nov 2015 15:29:12 GMT
I was expecting Parquet + thrift to perform faster but I wasn't expecting
that much, it was just to know whether my results were right or not. Thanks
for the moment Fabian!

On Fri, Nov 27, 2015 at 4:22 PM, Fabian Hueske <fhueske@gmail.com> wrote:

> Parquet is much cleverer that the TypeSerializer and applies columnar
> storage and compression technique.
> The TypeSerializerIOFs just use Flink's element-wise serializers to write
> and read binary data.
>
> I'd go with Parquet if it is working well for you.
>
> 2015-11-27 16:15 GMT+01:00 Flavio Pompermaier <pompermaier@okkam.it>:
>
>> I made a simple test and using parquet + thrift vs TypeSerializer IF/OF:
>> the former outperformed the second approach for a simple filter (not pushed
>> down) and a map+sum (something like 2 s vs 33s, and not considering disk
>> space usage that is much worse). Is that normal or TypeSerializer is
>> supposed to perform better then this?
>>
>>
>> On Fri, Nov 27, 2015 at 3:39 PM, Fabian Hueske <fhueske@gmail.com> wrote:
>>
>>> If you are just looking for an exchange format between two Flink jobs, I
>>> would go for the TypeSerializerInput/OutputFormat.
>>> Note that these are binary formats.
>>>
>>> Best, Fabian
>>>
>>> 2015-11-27 15:28 GMT+01:00 Flavio Pompermaier <pompermaier@okkam.it>:
>>>
>>>> Hi to all,
>>>>
>>>> I have a complex POJO (with nexted objects) that I'd like to write and
>>>> read with Flink (batch).
>>>> What is the simplest way to do that? I can't find any example of it :(
>>>>
>>>> Best,
>>>> Flavio
>>>>
>>>
>>>
>>
>

Mime
View raw message