Connected Streams is one option. But may be an overkill in your scenario if your CSV does not refresh. If your CSV is small enough (number of records wise), you could parse it and load it into an object (serializable) and pass it to the constructor of the operator where you will be streaming the data. 

If the CSV can be made available via a shared network folder (or S3 in case of AWS) you could also read it in the open function (if you use Rich versions of the operator).

The real problem I guess is how frequently does the CSV update. If you want the updates to propagate in near real time (or on schedule) the option 1  ( parse in driver and send it via constructor does not work). Also in the second option you need to be responsible for refreshing the file read from the shared folder.

In that case use Connected Streams where the stream reading in the file (the other stream reads the events) periodically re-reads the file and sends it down the stream. The refresh interval is your tolerance of stale data in the CSV.

On Fri, Sep 27, 2019 at 3:49 PM John Smith <> wrote:
I don't think I need state for this...

I need to load a CSV. I'm guessing as a table and then filter my events parse the number, transform the event into geolocation data and sink that downstream data source.

So I'm guessing i need a CSV source and my Kafka source and somehow join those transform the event...

On Fri, 27 Sep 2019 at 14:43, Oytun Tez <> wrote:

You should look broadcast state pattern in Flink docs.

Oytun Tez

The World's Fastest Human Translation Platform.

On Fri, Sep 27, 2019 at 2:42 PM John Smith <> wrote:
Using 1.8

I have a list of phone area codes, cities and their geo location in CSV file. And my events from Kafka contain phone numbers.

I want to parse the phone number get it's area code and then associate the phone number to a city, geo location and as well count how many numbers are in that city/geo location.