flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Al-Isawi Rami <Rami.Al-Is...@comptel.com>
Subject Re: Combining streams with static data and using REST API as a sink
Date Mon, 23 May 2016 15:17:35 GMT
Hi,

1. I have no experience in broadcast variables, I suggest you give it a try.

2. I misunderstood you, I thought you were calling for Flink to serve the results and become
REST API provider, where others can call those API. What you are saying now is that you want
a sink that does HTTP calls to some REST API hosted somewhere else. Well, if the number of
results are not in the order of thousands/sec then it is feasible to make HTTP calls. If you
even ended up writing suck sink, I would be happy to review and discuss it.

-Rami

On 23 May 2016, at 17:06, Josh <jofo90@gmail.com<mailto:jofo90@gmail.com>> wrote:


Hi Rami,

Thanks for the fast reply.

  1.  In your solution, would I need to create a new stream for 'item updates', and add it
as a source of my Flink job? Then I would need to ensure item updates get broadcast to all
nodes that are running my job and use them to update the in-memory items database? This sounds
like it might be a good solution, but I'm not sure how the broadcast would work - it sounds
like I'd need Flink broadcast variables, but it looks like there's no support for changing
datasets at the moment: https://issues.apache.org/jira/browse/FLINK-3514
  2.  I don't understand why an HTTP sink isn't possible. Say the output of my job is 'number
of items ordered per customer', then for each output I want to update a 'customer' in my database,
incrementing their 'item_order_count'. What's wrong with doing that update in the Flink job
via an HTTP REST call (updating the customer resource), rather than writing directly to a
database? The reason I'd like to do it this way is to decouple the underlying database from
Flink.

Josh

On Mon, May 23, 2016 at 2:35 PM, Al-Isawi Rami <Rami.Al-Isawi@comptel.com<mailto:Rami.Al-Isawi@comptel.com>>
wrote:
Hi Josh,

I am no expert in Flink yet, but here are my thoughts on this:

1. what about you stream an event to flink everytime the DB of items have an update? then
in some background thread you get the new data from the DB let it be through REST (if it is
only few updates a day) then load the results in memory and there is your updated static data.

2. REST API are over HTTP, how that is possible to be a sink? does not sound like flink job
at all to serve http requests. simply sink the results to some DB and have some component
to read from DB and serve it as REST API.

-Rami

On 23 May 2016, at 16:22, Josh <jofo90@gmail.com<mailto:jofo90@gmail.com>> wrote:


Hi all,

I am new to Flink and have a couple of questions which I've had trouble finding answers to
online. Any advice would be much appreciated!

  1.  What's a typical way of handling the scenario where you want to join streaming data
with a (relatively) static data source? For example, if I have a stream 'orders' where each
order has an 'item_id', and I want to join this stream with my database of 'items'. The database
of items is mostly static (with perhaps a few new items added every day). The database can
be retrieved either directly from a standard SQL database (postgres) or via a REST call. I
guess one way to handle this would be to distribute the database of items with the Flink tasks,
and to redeploy the entire job if the items database changes. But I think there's probably
a better way to do it?
  2.  I'd like my Flink job to output state to a REST API. (i.e. using the REST API as a sink).
Updates would be incremental, e.g. the job would output tumbling window counts which need
to be added to some property on a REST resource, so I'd probably implement this as a PATCH.
I haven't found much evidence that anyone else has used a REST API as a Flink sink - is there
a reason why this might be a bad idea?

Thanks for any advice on these,

Josh

Disclaimer: This message and any attachments thereto are intended solely for the addressed
recipient(s) and may contain confidential information. If you are not the intended recipient,
please notify the sender by reply e-mail and delete the e-mail (including any attachments
thereto) without producing, distributing or retaining any copies thereof. Any review, dissemination
or other use of, or taking of any action in reliance upon, this information by persons or
entities other than the intended recipient(s) is prohibited. Thank you.


Disclaimer: This message and any attachments thereto are intended solely for the addressed
recipient(s) and may contain confidential information. If you are not the intended recipient,
please notify the sender by reply e-mail and delete the e-mail (including any attachments
thereto) without producing, distributing or retaining any copies thereof. Any review, dissemination
or other use of, or taking of any action in reliance upon, this information by persons or
entities other than the intended recipient(s) is prohibited. Thank you.

Mime
View raw message