flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] fhueske commented on a change in pull request #6741: [FLINK-9712][table, docs] Document processing time Temporal Table Joins
Date Wed, 26 Sep 2018 16:36:18 GMT
fhueske commented on a change in pull request #6741: [FLINK-9712][table,docs] Document processing
time Temporal Table Joins
URL: https://github.com/apache/flink/pull/6741#discussion_r220635686
 
 

 ##########
 File path: docs/dev/table/streaming/joins.md
 ##########
 @@ -0,0 +1,93 @@
+---
+title: "Joins"
+nav-parent_id: streaming_tableapi
+nav-pos: 3
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+In batch processing joins are relatively easy, since we are working on a bounded completed
data sets.
+In stream processing things are a little bit more complicated, especially when it comes to
the issue how to handle that data can change over time.
+Because of that, there are couple of ways to actually perform the join using either Table
API or SQL.
+
+For more information regarding syntax please check Joins sections in [Table API](../tableApi.html#joins)
and [SQL](../sql.html#joins).
+
+* This will be replaced by the TOC
+{:toc}
+
+Regular Joins
+-------------
+
+This is the most basic case in which any new records or changes to either side of the join
input are visible and are affecting whole join result.
+If there is a new record on the left side, it will be joined with all of the previous and
future records on the other side.
+
+Such semantic has an important limitation:
+it requires to keep both sides of the join input on the state indefinitely and resource usage
will grow indefinitely as well.
+
+Example:
+{% highlight sql %}
+SELECT * FROM Orders
+INNER JOIN Product
+ON Orders.productId = Product.id
+{% endhighlight %}
+
+Time-windowed Joins
+-------------------
+
+In this case we are restricting scope of the join to some time window.
+This allows Flink to remove old values from the state (using [watermarks](time_attributes.html)
without affecting the correctness of the result.
+
+Example:
+{% highlight sql %}
+SELECT *
+FROM
+  Orders o,
+  Shipments s
+WHERE o.id = s.orderId AND
+      o.ordertime BETWEEN s.shiptime - INTERVAL '4' HOUR AND s.shiptime
+{% endhighlight %}
+
+Temporal Table Joins
+--------------------
+
+Temporal Table Joins allow to join a stream (left/probe side) with a table (right/build side)
that changes over time.
+For each record from the probe side, it will be joined only with the latest version of the
build side.
+That means (in contrast to [Regular Joins](#regular-joins)) if there is a new record on the
build side,
+it will not affect the previous past results of the join.
+This again allow Flink to limit the number of elements that must be kept on the state.
+In order to support updates (overwrites) of previous values on the build side table, this
table must define a primary key.
+
+Compared to [Time-windowed Joins](#time-windowed-joins),
+Temporal Table Joins are not defining a time window within which bounds the records will
be joined.
+Records from the probe side are joined with the most recent versions of the build side and
records on the build side might be arbitrary old.
 
 Review comment:
   We need to rephrase "most recent version" once we support event time. Maybe it makes sense
to explain this a bit more general. This would also help to understand the syntax better (why
do we need to pass a proc time attribute into the temporal table function?)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message