flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From milind parikh <milindspar...@gmail.com>
Subject Re: Serializers and Schemas
Date Wed, 07 Dec 2016 21:00:05 GMT
Why not use a self-describing format  (json), stream as String and read
through a json reader and avoid top-level reflection?

Github.com/milindparikh/streamingsi

https://github.com/milindparikh/streamingsi/tree/master/epic-poc/sprint-2-simulated-data-no-cdc-advanced-eventing/2-dataprocessing

?

Apologies if I misunderstood the question. But I can quite see how to model
your Product class (or indeed POJO) in a fairly generic way ( assumes JSON).

The real issues faced when you have different versions of same POJO class
requires storing enough information to dynamically instantiate the actual
version of the class; which I believe is beyond the simple use case.

Milind
On Dec 7, 2016 2:42 PM, "Matt" <dromitlabs@gmail.com> wrote:

> I've read your example, but I've found the same problem. You're
> serializing your POJO as a string, where all fields are separated by "\t".
> This may work for you, but not in general.
>
> https://github.com/luigiselmi/pilot-sc4-fcd-producer/blob/
> master/src/main/java/eu/bde/sc4pilot/fcd/FcdTaxiEvent.java#L60
>
> I would like to see a more "generic" approach for the class Product in my
> last message. I believe a more general purpose de/serializer for POJOs
> should be possible to achieve using reflection.
>
> On Wed, Dec 7, 2016 at 1:16 PM, Luigi Selmi <luigiselmi@gmail.com> wrote:
>
>> Hi Matt,
>>
>> I had the same problem, trying to read some records in event time using a
>> POJO, doing some transformation and save the result into Kafka for further
>> processing. I am not yet done but maybe the code I wrote starting from the Flink
>> Forward 2016 training docs
>> <http://dataartisans.github.io/flink-training/exercises/popularPlaces.html>
>> could be useful.
>>
>> https://github.com/luigiselmi/pilot-sc4-fcd-producer
>>
>>
>> Best,
>>
>> Luigi
>>
>> On 7 December 2016 at 16:35, Matt <dromitlabs@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I don't quite understand how to integrate Kafka and Flink, after a lot
>>> of thoughts and hours of reading I feel I'm still missing something
>>> important.
>>>
>>> So far I haven't found a non-trivial but simple example of a stream of a
>>> custom class (POJO). It would be good to have such an example in Flink
>>> docs, I can think of many many scenarios in which using SimpleStringSchema
>>> is not an option, but all Kafka+Flink guides insist on using that.
>>>
>>> Maybe we can add a simple example to the documentation [1], it would be
>>> really helpful for many of us. Also, explaining how to create a Flink
>>> De/SerializationSchema from a Kafka De/Serializer would be really useful
>>> and would save a lot of time to a lot of people, it's not clear why you
>>> need both of them or if you need both of them.
>>>
>>> As far as I know Avro is a common choice for serialization, but I've
>>> read Kryo's performance is much better (true?). I guess though that the
>>> fastest serialization approach is writing your own de/serializer.
>>>
>>> 1. What do you think about adding some thoughts on this to the
>>> documentation?
>>> 2. Can anyone provide an example for the following class?
>>>
>>> ---
>>> public class Product {
>>>     public String code;
>>>     public double price;
>>>     public String description;
>>>     public long created;
>>> }
>>> ---
>>>
>>> Regards,
>>> Matt
>>>
>>> [1] http://data-artisans.com/kafka-flink-a-practical-how-to/
>>>
>>
>>
>>
>> --
>> Luigi Selmi, M.Sc.
>> Fraunhofer IAIS Schloss Birlinghoven .
>> 53757 Sankt Augustin, Germany
>> Phone: +49 2241 14-2440 <+49%202241%20142440>
>>
>>
>

Mime
View raw message