flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Neumann <mneum...@sics.se>
Subject Re: streaming using DeserializationSchema
Date Sat, 13 Feb 2016 23:26:21 GMT
I ended up not using the DeserializationSchema and instead going for a
AvrioInputFormat in case of reading From file. I would have preferred to
keep the code simpler but the map solution was a lot more complicated since
the raw data I have is in Avro binary format so I cannot just read it and
map it later.

cheers Martin


On Fri, Feb 12, 2016 at 10:47 PM, Nick Dimiduk <ndimiduk@apache.org> wrote:

> My input file contains newline-delimited JSON records, one per text line.
> The records on the Kafka topic are JSON blobs encoded to UTF8 and written
> as bytes.
>
> On Fri, Feb 12, 2016 at 1:41 PM, Martin Neumann <mneumann@sics.se> wrote:
>
>> I'm trying the same thing now.
>>
>> I guess you need to read the file as byte arrays somehow to make it work.
>> What read function did you use? The mapper is not hard to write but the
>> byte array stuff gives me a headache.
>>
>> cheers Martin
>>
>>
>>
>>
>> On Fri, Feb 12, 2016 at 9:12 PM, Nick Dimiduk <ndimiduk@apache.org>
>> wrote:
>>
>>> Hi Martin,
>>>
>>> I have the same usecase. I wanted to be able to load from dumps of data
>>> in the same format as is on the kafak queue. I created a new application
>>> main, call it the "job" instead of the "flow". I refactored my code a bit
>>> for building the flow so all that can be reused via factory method. I then
>>> implemented a MapFunction that simply calls my existing deserializer.
>>> Create a new DataStream from flat file and tack on the MapFunction step.
>>> The resulting DataStream is then type-compatible with the Kakfa consumer
>>> that starts the "flow" application, so I pass it into the factory method.
>>> Tweak the ParameterTools options for the "job" application, et voilà!
>>>
>>> Sorry I don't have example code for you; this would be a good example to
>>> contribute back to the community's example library though.
>>>
>>> Good luck!
>>> -n
>>>
>>> On Fri, Feb 12, 2016 at 2:25 AM, Martin Neumann <mneumann@sics.se>
>>> wrote:
>>>
>>>> Its not only about testing, I will also need to run things against
>>>> different datasets. I want to reuse as much of the code as possible to load
>>>> the same data from a file instead of kafka.
>>>>
>>>> Is there a simple way of loading the data from a File using the same
>>>> conversion classes that I would use to transfrom them when I read them from
>>>> kafka or do I have to write a new avro deserializer (InputFormat).
>>>>
>>>> On Fri, Feb 12, 2016 at 2:06 AM, Gyula Fóra <gyula.fora@gmail.com>
>>>> wrote:
>>>>
>>>>> Hey,
>>>>>
>>>>> A very simple thing you could do is to set up a simple kafka producer
>>>>> in a java program that will feed the data into a topic. This also has
the
>>>>> additional benefit that you are actually testing against kafka.
>>>>>
>>>>> Cheers,
>>>>> Gyula
>>>>>
>>>>> Martin Neumann <mneumann@sics.se> ezt írta (időpont: 2016. febr.
12.,
>>>>> P, 0:20):
>>>>>
>>>>>> Hej,
>>>>>>
>>>>>> I have a stream program reading data from Kafka where the data is
in
>>>>>> avro. I have my own DeserializationSchema to deal with it.
>>>>>>
>>>>>> For testing reasons I want to read a dump from hdfs instead, is there
>>>>>> a way to use the same DeserializationSchema to read from an avro
file
>>>>>> stored on hdfs?
>>>>>>
>>>>>> cheers Martin
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message