flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Neumann <mneum...@sics.se>
Subject Re: streaming using DeserializationSchema
Date Sat, 13 Feb 2016 23:26:21 GMT
I ended up not using the DeserializationSchema and instead going for a
AvrioInputFormat in case of reading From file. I would have preferred to
keep the code simpler but the map solution was a lot more complicated since
the raw data I have is in Avro binary format so I cannot just read it and
map it later.

cheers Martin

On Fri, Feb 12, 2016 at 10:47 PM, Nick Dimiduk <ndimiduk@apache.org> wrote:

> My input file contains newline-delimited JSON records, one per text line.
> The records on the Kafka topic are JSON blobs encoded to UTF8 and written
> as bytes.
> On Fri, Feb 12, 2016 at 1:41 PM, Martin Neumann <mneumann@sics.se> wrote:
>> I'm trying the same thing now.
>> I guess you need to read the file as byte arrays somehow to make it work.
>> What read function did you use? The mapper is not hard to write but the
>> byte array stuff gives me a headache.
>> cheers Martin
>> On Fri, Feb 12, 2016 at 9:12 PM, Nick Dimiduk <ndimiduk@apache.org>
>> wrote:
>>> Hi Martin,
>>> I have the same usecase. I wanted to be able to load from dumps of data
>>> in the same format as is on the kafak queue. I created a new application
>>> main, call it the "job" instead of the "flow". I refactored my code a bit
>>> for building the flow so all that can be reused via factory method. I then
>>> implemented a MapFunction that simply calls my existing deserializer.
>>> Create a new DataStream from flat file and tack on the MapFunction step.
>>> The resulting DataStream is then type-compatible with the Kakfa consumer
>>> that starts the "flow" application, so I pass it into the factory method.
>>> Tweak the ParameterTools options for the "job" application, et voilà!
>>> Sorry I don't have example code for you; this would be a good example to
>>> contribute back to the community's example library though.
>>> Good luck!
>>> -n
>>> On Fri, Feb 12, 2016 at 2:25 AM, Martin Neumann <mneumann@sics.se>
>>> wrote:
>>>> Its not only about testing, I will also need to run things against
>>>> different datasets. I want to reuse as much of the code as possible to load
>>>> the same data from a file instead of kafka.
>>>> Is there a simple way of loading the data from a File using the same
>>>> conversion classes that I would use to transfrom them when I read them from
>>>> kafka or do I have to write a new avro deserializer (InputFormat).
>>>> On Fri, Feb 12, 2016 at 2:06 AM, Gyula Fóra <gyula.fora@gmail.com>
>>>> wrote:
>>>>> Hey,
>>>>> A very simple thing you could do is to set up a simple kafka producer
>>>>> in a java program that will feed the data into a topic. This also has
>>>>> additional benefit that you are actually testing against kafka.
>>>>> Cheers,
>>>>> Gyula
>>>>> Martin Neumann <mneumann@sics.se> ezt írta (időpont: 2016. febr.
>>>>> P, 0:20):
>>>>>> Hej,
>>>>>> I have a stream program reading data from Kafka where the data is
>>>>>> avro. I have my own DeserializationSchema to deal with it.
>>>>>> For testing reasons I want to read a dump from hdfs instead, is there
>>>>>> a way to use the same DeserializationSchema to read from an avro
>>>>>> stored on hdfs?
>>>>>> cheers Martin

View raw message