flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timo Walther <twal...@apache.org>
Subject Re: [DISCUSS] Some thoughts about unify Stream SQL and Batch SQL grammer
Date Thu, 18 Aug 2016 15:19:21 GMT
Hi Jark,

sorry that I didn't wrote back earlier. I wanted to talk to Fabian first 
about this. In general, according to Calcite's plans, even SQL queries 
containing the "STREAM" keyword are regular standard SQL. In theory we 
could omit the "STREAM" keyword as long as it is guaranteed that the 
generated logical plans look the same. So I'm not against having the 
same grammar for both batch and streaming queries. However, I think we 
should contribute code to Calcite if the logical representation is not 
there already for operators we need. We need to research how far the 
Calcite development is. We can implement windows via 
user-defined-function as it also done in Calcite streaming design document.

It would be very interesting for the upcoming design phase if you could 
show us how you implemented your Blink SQL. For instance, how do you 
define windows there?


Am 18/08/16 um 16:34 schrieb Aljoscha Krettek:
> Hi,
> I personally would like it a lot if the SQL queries for batch and 
> stream programs looked the same. With the decision to move the Table 
> API on top of Calcite and also use the Calcite SQL parser Flink is 
> somewhat tied to Calcite so I don't know whether we can add our own 
> window constructs and teach the parser to properly read them.
> Maybe Fabian and Timo have more insights here since they worked on the 
> move to Calcite.
> Cheers,
> Aljoscha
> +Timo looping him in directly
> On Tue, 16 Aug 2016 at 09:29 Jark Wu <wuchong.wc@alibaba-inc.com 
> <mailto:wuchong.wc@alibaba-inc.com>> wrote:
>     Hi,
>     Currently, Flink use Calcite for SQL parsing. So we use the
>     StreamSQL grammer proposed by Calcite[1] which we have to use the
>     `STREAM` keyword in SQL. For example, `SELECT *
>     FROM Orders` is a regular standard SQL and will be translated to a
>     batch job. If you want to statement a stream job, you have add the
>     `STREAM` keyword, `SELECT STREAM *
>     FROM Orders`.
>     I'm thinking of why do we distinguish between StreamSQL and
>     BatchSQL grammer? We already have separate high-level API for
>     batch(DataSet) and stream(DataStream). And we have a unified Table
>     API for batch and stream (that's great!). Why do we have to
>     separate them again in SQL?
>     I hope we can manipulate stream data like a table. Such as `SELECT *
>     FROM Orders`, if Orders is a table (or run in batch execution
>     env), then it's a batch job. If Orders is a stream (or run in
>     stream execution env), then it's a stream job. The grammer of
>     StreamSQL and BatchSQL is totally the same. And that is what we
>     did in Blink SQL.
>     The benefits if we unify the grammar :
>     1. Easy to use StreamSQL for anyone who knows regular SQL. There
>     is no difference between StreamSQL and regular SQL.
>     2. Not blocked by Calcite. Currently, Calcite StreamSQL is not
>     fullly supported. Not support stream-to-stream JOIN, not support
>     window aggregate, not support aggregate without window, etc. We
>     may need to wait for calcite to support them before we start work.
>     As they are supported by regular SQL besides window. We can
>     implement window via user-defined-function. So if we can use
>     regular SQL instead of StreamSQL, we can start to work it right
>     now and not wait for Calcite.
>     3. Blink SQL can merge back to community to accelerate Flink SQL
>     evolving. Blink SQL has done most work of it. We implement
>     UDF/UDTF/UDAF, aggregate with/without window, and stream-to-stream
>     JOIN, and so on.
>     4. Window also can work in batch job.
>     Just my thoughts :)
>     What do you think about this ?
>     [1] https://calcite.apache.org/docs/stream.html
>     - Jark Wu

Freundliche Grüße / Kind Regards

Timo Walther

Follow me: @twalthr

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message