flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sameer W <sam...@axiomine.com>
Subject Re: Best way to link static data to event data?
Date Fri, 27 Sep 2019 21:21:31 GMT
Connected Streams is one option. But may be an overkill in your scenario if
your CSV does not refresh. If your CSV is small enough (number of records
wise), you could parse it and load it into an object (serializable) and
pass it to the constructor of the operator where you will be streaming the

If the CSV can be made available via a shared network folder (or S3 in case
of AWS) you could also read it in the open function (if you use Rich
versions of the operator).

The real problem I guess is how frequently does the CSV update. If you want
the updates to propagate in near real time (or on schedule) the option 1  (
parse in driver and send it via constructor does not work). Also in the
second option you need to be responsible for refreshing the file read from
the shared folder.

In that case use Connected Streams where the stream reading in the file
(the other stream reads the events) periodically re-reads the file and
sends it down the stream. The refresh interval is your tolerance of stale
data in the CSV.

On Fri, Sep 27, 2019 at 3:49 PM John Smith <java.dev.mtl@gmail.com> wrote:

> I don't think I need state for this...
> I need to load a CSV. I'm guessing as a table and then filter my events
> parse the number, transform the event into geolocation data and sink that
> downstream data source.
> So I'm guessing i need a CSV source and my Kafka source and somehow join
> those transform the event...
> On Fri, 27 Sep 2019 at 14:43, Oytun Tez <oytun@motaword.com> wrote:
>> Hi,
>> You should look broadcast state pattern in Flink docs.
>> ---
>> Oytun Tez
>> *M O T A W O R D*
>> The World's Fastest Human Translation Platform.
>> oytun@motaword.com — www.motaword.com
>> On Fri, Sep 27, 2019 at 2:42 PM John Smith <java.dev.mtl@gmail.com>
>> wrote:
>>> Using 1.8
>>> I have a list of phone area codes, cities and their geo location in CSV
>>> file. And my events from Kafka contain phone numbers.
>>> I want to parse the phone number get it's area code and then associate
>>> the phone number to a city, geo location and as well count how many numbers
>>> are in that city/geo location.

View raw message