flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jark Wu" <wuchong...@alibaba-inc.com>
Subject Re: [DISCUSS] Some thoughts about unify Stream SQL and Batch SQL grammer
Date Mon, 29 Aug 2016 12:31:24 GMT
Hi Timo,

Yes, you are right. The STREAM keyword is invalid now. If there is a STREAM keyword in the
query, the parser will throw "can’t convert table xxx to stream" exception. Because we register
the table as a regular table not streamable. 

- Jark Wu 

> 在 2016年8月29日,下午8:13,Timo Walther <twalthr@apache.org> 写道:
> 
> Hi Jark,
> 
> your code looks good and it also simplifies many parts. So the STREAM keyword is not
optional but invalid now, right? What happens if there is keyword in the query?
> 
> Timo
> 
> 
> Am 29/08/16 um 05:40 schrieb Jark Wu:
>> Hi Fabian, Timo,
>> 
>> I have created a prototype for removing STREAM keyword and using batch sql parser
for stream jobs.
>> 
>> This is the working brach:  https://github.com/wuchong/flink/tree/remove-stream <https://github.com/wuchong/flink/tree/remove-stream>
>> 
>> Looking forward to your feedback.
>> 
>> - Jark Wu
>> 
>>> 在 2016年8月24日,下午4:56,Fabian Hueske <fhueske@gmail.com> 写道:
>>> 
>>> Starting with a prototype would be great, Jark.
>>> We had some trouble with Calcite's StreamableTable interface anyways. A few
>>> things can be simplified if we do not declare our tables as streamable.
>>> I would try to implement DataStreamTable (and all related classes and
>>> methods) equivalent to DataSetTables if possible.
>>> 
>>> Best, Fabian
>>> 
>>> 2016-08-24 6:27 GMT+02:00 Jark Wu <wuchong.wc@alibaba-inc.com>:
>>> 
>>>> Hi Fabian,
>>>> 
>>>> You are right, the main thing we need to change for removing STREAM
>>>> keyword is the table registration. If you would like, I can do a prototype.
>>>> 
>>>> Hi Timo,
>>>> 
>>>> I’m glad to contribute our work back to Flink. I will look into it and
>>>> create JIRAs next days.
>>>> 
>>>> - Jark Wu
>>>> 
>>>>> 在 2016年8月24日,上午12:13,Fabian Hueske <fhueske@gmail.com>
写道:
>>>>> 
>>>>> Hi Jark,
>>>>> 
>>>>> We can think about removing the STREAM keyword or not. In principle,
>>>>> Calcite should allow the same windowing syntax on streaming and static
>>>>> tables (this is one of the main goals of Calcite). The Table API can
also
>>>>> distinguish stream and batch without the STREAM keyword by looking at
the
>>>>> ExecutionEnvironment.
>>>>> I think we would need to change the way that tables are registered in
>>>>> Calcite's catalog and also add more validation (check that time windows
>>>>> refer to a time column, etc).
>>>>> A prototype should help to see what the consequence of removing the
>>>> STREAM
>>>>> keyword (which is actually, changing the table registration, the parser
>>>> is
>>>>> the same) would be.
>>>>> 
>>>>> Regarding streaming aggregates without window definition: We can
>>>> certainly
>>>>> implement this feature in the Table API. There are a few points that
need
>>>>> to be considered like value expiration after a certain time of update
>>>>> inactivity (otherwise the state might grow infinitely). But these aspects
>>>>> should be rather easy to solve. I think for SQL, such running aggregates
>>>>> are a special case of the Sliding Windows as discussed in Calcite's
>>>>> StreamSQL document [1].
>>>>> 
>>>>> Thanks also for the document! I'll take that into account when sketching
>>>>> the FLIP for streaming aggregation support.
>>>>> 
>>>>> Cheers, Fabian
>>>>> 
>>>>> [1] http://calcite.apache.org/docs/stream.html#sliding-windows
>>>>> 
>>>>> 2016-08-23 13:09 GMT+02:00 Jark Wu <wuchong.wc@alibaba-inc.com>:
>>>>> 
>>>>>> Hi Fabian, Timo,
>>>>>> 
>>>>>> Sorry for the late response.
>>>>>> 
>>>>>> Regarding Calcite’s StreamSQL syntax, what I concern is only the
STREAM
>>>>>> keyword and no agg-without-window. Which makes different syntax for
>>>>>> streaming and static tables. I don’t think Flink should have a
custom
>>>> SQL
>>>>>> syntax, but it’s better to have a consistent syntax for batch and
>>>>>> streaming. Regarding window syntax , I think it’s good and reasonable
to
>>>>>> follow Calcite’s syntax. Actually, we implement Blink SQL Window
>>>> following
>>>>>> Calcite’s syntax[1].
>>>>>> 
>>>>>> In addition, I describe the Blink SQL design including UDF, UDTF,
UDAF,
>>>>>> Window in google doc[1]. Hope that can help for the upcoming Flink
SQL
>>>>>> design.
>>>>>> 
>>>>>> +1 for creating FLIP
>>>>>> 
>>>>>> [1] https://docs.google.com/document/d/15iVc1781dxYWm3loVQlESYvMAxEzb
>>>>>> buVFPZWBYuY1Ek
>>>>>> 
>>>>>> 
>>>>>> - Jark Wu
>>>>>> 
>>>>>>> 在 2016年8月23日,下午3:47,Fabian Hueske <fhueske@gmail.com>
写道:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I did a bit of prototyping yesterday to check to what extend
Calcite
>>>>>>> supports window operations on streams if we would implement them
for
>>>> the
>>>>>>> Table API.
>>>>>>> For the Table API we do not go through Calcite's SQL parser and
>>>>>> validator,
>>>>>>> but generate the logical plan (tree of RelNodes) ourselves mostly
using
>>>>>>> Calcite's Relbuilder.
>>>>>>> It turns out that Calcite does not restrict grouped aggregations
on
>>>>>> streams
>>>>>>> at this abstraction level, i.e., it does not perform any checks.
>>>>>>> 
>>>>>>> I think it should be possible to implement windowed aggregates
for the
>>>>>>> Table API. Once CALCITE-1345 [1] is implemented (and released),
>>>> windowed
>>>>>>> aggregates are also supported by the SQL parser, validator, and
>>>>>> optimizer.
>>>>>>> In order to make them work with our implementation we would need
to
>>>> adapt
>>>>>>> our solution to it (only internally), but I am sure we could
reuse a
>>>> lot
>>>>>> of
>>>>>>> our initial implementation (Table API, validation, execution).
>>>>>>> 
>>>>>>> I drafted an API proposal a few months ago [2] and could convert
this
>>>>>> into
>>>>>>> a FLIP to discuss the API and break it down into subtasks.
>>>>>>> 
>>>>>>> What do you think?
>>>>>>> 
>>>>>>> Cheers, Fabian
>>>>>>> 
>>>>>>> [1] https://issues.apache.org/jira/browse/CALCITE-1345
>>>>>>> [2]
>>>>>>> https://docs.google.com/document/d/19kSOAOINKCSWLBCKRq2WvNtmuaA9o
>>>>>> 3AyCh2ePqr3V5E
>>>>>>> 2016-08-19 11:04 GMT+02:00 Fabian Hueske <fhueske@gmail.com>:
>>>>>>> 
>>>>>>>> Hi Jark,
>>>>>>>> 
>>>>>>>> thanks for starting this discussion. Actually, I think we
are rather
>>>>>>>> "blocked" on the internal handling of streaming windows in
Calcite
>>>> than
>>>>>> the
>>>>>>>> SQL parser. IMO, it should be possible to exchange or modify
the
>>>> parser
>>>>>> if
>>>>>>>> we want that.
>>>>>>>> 
>>>>>>>> Regarding Calcite's StreamSQL syntax: Except for the STREAM
keyword,
>>>>>>>> Calcite closely follows the SQL standard (e.g.,no special
keywords
>>>> like
>>>>>>>> WINDOW. Instead stream specific aspects like tumbling windows
are done
>>>>>> as
>>>>>>>> functions such as TUMBLE [1]). One main motivation of the
Calcite
>>>>>> community
>>>>>>>> is to have the same syntax for streaming and static tables.
This
>>>>>> includes
>>>>>>>> support for tables which are static and streaming at the
same time
>>>> (the
>>>>>>>> example of [1] is a table about orders to which new order
records are
>>>>>>>> added). When querying such a table, the STREAM keyword is
required to
>>>>>>>> distinguish the cases of a batch query which returns a result
set and
>>>> a
>>>>>>>> standing query which returns a result stream. In the context
of Flink
>>>> we
>>>>>>>> can can do the distinction using the type of the TableEnvironment.
So
>>>> we
>>>>>>>> could use the batch parser, but would need to change a couple
things
>>>>>>>> internally and add checks for proper grouping on the timestamp
column
>>>>>> when
>>>>>>>> doing windows, etc. So far the discussion about the StreamSQL
syntax
>>>>>> rather
>>>>>>>> focused on the question whether 1) StreamSQL should follow
the SQL
>>>>>> standard
>>>>>>>> (as Calcite proposes) or 2) whether Flink should use a custom
syntax
>>>>>> with
>>>>>>>> stream specific features. For instance a tumbling window
is expressed
>>>> in
>>>>>>>> the GROUP BY clause [1] when following standard SQL but it
could be
>>>>>> defined
>>>>>>>> using a special WINDOW keyword in a custom StreamSQL dialect.
>>>>>>>> 
>>>>>>>> You are right that we have a dependency on Calcite. However,
I think
>>>>>> this
>>>>>>>> dependency is rather in the internals than the parser, i.e.,
how does
>>>>>> the
>>>>>>>> validator/optimizer support and handle monotone / quasi-monotone
>>>>>> attributes
>>>>>>>> and windows. I am not sure how much is already supported
but the
>>>> Calcite
>>>>>>>> community is working on this [2]. I think we need these features
in
>>>>>> Calcite
>>>>>>>> unless we want to completely remove our dependency on Calcite
for
>>>>>>>> StreamSQL. I would not be in favor of removing Calcite at
this point.
>>>> We
>>>>>>>> put a lot of effort into refactoring the Table API internals.
Instead
>>>> we
>>>>>>>> should start to talk to the Calcite community and see how
far they
>>>> are,
>>>>>>>> what is missing, and how we can help.
>>>>>>>> 
>>>>>>>> I will start a discussion on the Calcite dev mailing list
in the next
>>>>>> days
>>>>>>>> and ask about the status of StreamSQL.
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Fabian
>>>>>>>> 
>>>>>>>> [1] http://calcite.apache.org/docs/stream.html#tumbling-
>>>>>> windows-improved
>>>>>>>> [2] https://issues.apache.org/jira/browse/CALCITE-1345
>>>>>>>> 
>>>>>> 
>>>> 
>> 
> 
> 
> -- 
> Freundliche Grüße / Kind Regards
> 
> Timo Walther
> 
> Follow me: @twalthr
> https://www.linkedin.com/in/twalthr


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message