flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <rmetz...@apache.org>
Subject Re: Flink redshift table lookup and updates
Date Fri, 19 Aug 2016 12:23:31 GMT
Hi Harshith,

Welcome to the Flink community ;)

I would recommend using approach 2. Keeping the state in Flink and just
sending updates to the dashboard store should give you better performance
and consistency.
I don't know whether its better to download the full state snapshot from
redshift in the beginning, or lazily load the required data once you need
it (and then use the state afterwards).


On Fri, Aug 19, 2016 at 5:20 AM, Harshith Chennamaneni <
hchennamaneni@hiya.com> wrote:

> Hi,
> I've very recently come upon flink and I'm trying to use it to solve a
> problem that I have.
> I have a stream of User Settings updates coming through kafka queue. I
> need to store the most recent settings along with a history of settings for
> each user in redshift which then feeds into analytics dashboards.
> I've been contemplating using Flink for this problem. I wanted some
> guidance from people experienced in Flink to help me decide if Flink is
> suited to this problem and if so what approach might work best. I am
> considering the following approaches:
> 1. Create a secondary key-value database with the users latest settings
> and lookup these settings after grouping the stream byKey(userId) to check
> if a setting has changed and if so create a history record. I came across
> this stackoverflow thread: http://stackoverflow.com/
> questions/38866078/how-to-look-up-and-update-the-state-
> of-a-record-from-a-database-in-apache-flink to help with this approach.
> 2. Pull the current snapshot of users from redshift and keep it as state
> in Flink program at program start (the snapshot isn't huge ~1GB).
> Subsequently lookup from this state and update it when processing events.
> In both these cases I plan to create a Redshift sink that batches updates
> to history as well as latest state and persists to redshift by batches
> (through s3 and copy command for history, through a update on join for
> snapshot).
> Is one of these designs more suited to working with Flink? Is there an
> alternative I should consider?
> Thanks!
> -H

View raw message