flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Smith <java.dev....@gmail.com>
Subject Re: Best way to link static data to event data?
Date Fri, 27 Sep 2019 20:38:28 GMT
It's a fairly small static file that may update once in a blue moon lol But
I'm hopping to use existing functions. Why can't I just use CSV to table
source?

Why should I have to now either write my own CSV parser or look for 3rd
party, then what put in a Java Map and lookup that map? I'm finding Flink
to be a bit of death by 1000 paper cuts lol

if i put the CSV in a table I can then use it to join across it with the
event no?

On Fri, 27 Sep 2019 at 16:25, Sameer W <sameer@axiomine.com> wrote:

> Connected Streams is one option. But may be an overkill in your scenario
> if your CSV does not refresh. If your CSV is small enough (number of
> records wise), you could parse it and load it into an object (serializable)
> and pass it to the constructor of the operator where you will be streaming
> the data.
>
> If the CSV can be made available via a shared network folder (or S3 in
> case of AWS) you could also read it in the open function (if you use Rich
> versions of the operator).
>
> The real problem I guess is how frequently does the CSV update. If you
> want the updates to propagate in near real time (or on schedule) the option
> 1  ( parse in driver and send it via constructor does not work). Also in
> the second option you need to be responsible for refreshing the file read
> from the shared folder.
>
> In that case use Connected Streams where the stream reading in the file
> (the other stream reads the events) periodically re-reads the file and
> sends it down the stream. The refresh interval is your tolerance of stale
> data in the CSV.
>
> On Fri, Sep 27, 2019 at 3:49 PM John Smith <java.dev.mtl@gmail.com> wrote:
>
>> I don't think I need state for this...
>>
>> I need to load a CSV. I'm guessing as a table and then filter my events
>> parse the number, transform the event into geolocation data and sink that
>> downstream data source.
>>
>> So I'm guessing i need a CSV source and my Kafka source and somehow join
>> those transform the event...
>>
>> On Fri, 27 Sep 2019 at 14:43, Oytun Tez <oytun@motaword.com> wrote:
>>
>>> Hi,
>>>
>>> You should look broadcast state pattern in Flink docs.
>>>
>>> ---
>>> Oytun Tez
>>>
>>> *M O T A W O R D*
>>> The World's Fastest Human Translation Platform.
>>> oytun@motaword.com — www.motaword.com
>>>
>>>
>>> On Fri, Sep 27, 2019 at 2:42 PM John Smith <java.dev.mtl@gmail.com>
>>> wrote:
>>>
>>>> Using 1.8
>>>>
>>>> I have a list of phone area codes, cities and their geo location in CSV
>>>> file. And my events from Kafka contain phone numbers.
>>>>
>>>> I want to parse the phone number get it's area code and then associate
>>>> the phone number to a city, geo location and as well count how many numbers
>>>> are in that city/geo location.
>>>>
>>>

Mime
View raw message