flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elias Levy <fearsome.lucid...@gmail.com>
Subject Re: SQL materialized upsert tables
Date Wed, 21 Feb 2018 00:06:31 GMT
[ Adding the list back in, as this clarifies my question ]

On Tue, Feb 20, 2018 at 3:42 PM, Darshan Singh <darshan.meel@gmail.com>

> I am no expert in Flink but I will try my best. Issue you mentioned will
> be with all streaming systems even with Kafka KTable I use them a lot for
> similar sort of requirements.
> In Kafka you have KTable on Telemetry with 3 records and join with say
> Scores which could be KTable or Kstrem  and you start your streaming query
> as mentioned above it will give just 1 row as expected. However, if there
> is a new value for the same key with timestamp greater than previous max
> will be added to the Telemetry it will output the new value as well and
> that is main idea about the streaming anyway you want to see the changed
> value. So once you started streaming you will get whatever is the outcome
> of your


Thanks for the reply.  I've already implemented this job using Kafka
Streams, so I am aware of how KTables behaves.  I would have helped if I
had included some sample data in my post, so here it is.  If you have this
data coming into Telemetry:

ts, item, score, source
0, item1, 1, source1
1, item1, 1, source1
2, item1, 1, source1

And this comes into Scores:

ts, item, score
3, item1, 3

Flink will output 3 records from the queries I mentioned:

(3, item1, 3, source1)
(3, item1, 3, source1)
(3, item1, 3, source1)

In contrast, if you run the query in Kafka Stream configuring Telemetry as
a KTable keyed by (item, source), the output will be a single record.  In
Telemetry record for key (item1, source1) at time 1 will overwrite the
record at time 0, and the record at time 2 will overwrite the one at time
1.  By the time the record at time 3 comes in via Scores, it will be joined
only with the record from time 2 in Telemetry.

Yes, it is possible for the Kafka Streams query to output multiple records
if the records from the different streams are not time aligned, as Kafka
Streams only guarantees a best effort aligning the streams. But in the
common case the output will be a single record.

I think in fllink you can do the same, from your telemeter stream/table you
> can create the LatestTelemetry table using similar sql(I am sure it should
> give you latest timestamp's data) as you did with the RDBMS and then join
> with scores table. You should get similar results to KTable or any other
> streaming system.

Not sure if you missed it, but I actually executed the query to define the
LatestTelemetry table in Flink using that query and joined against it.  The
output was the same three records.

View raw message