flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Haohui Mai <ricet...@gmail.com>
Subject Re: [DISCUSS] Table API / SQL features for Flink 1.4.0
Date Wed, 21 Jun 2017 14:32:59 GMT
Hi,

We are interested in building the simplest case of stream-table joins --
essentially calling stream.map(x => (x, table.get(x)). It solves the use
cases of augmenting the streams with the information of the database. The
operation itself can be batched for better performance.

We are happy to contribute to the the scalar functions as well as we
internally also share similar requirements.

Fabian mentioned that the development of Table / SQL API was bottlenecked
by committers, which shows that there are thriving developments happening
in the space. I think it is a good problem to have. :-)

I wonder, is it a good time to nominate new batches of committers and to
keep the momentum of developments?

Regards,
Haohui



On Fri, Jun 16, 2017 at 7:28 AM jincheng sun <sunjincheng121@gmail.com>
wrote:

> Hi Fabian,
> Thanks for bring up this discuss.
> In order to enrich Flink's built-in scalar function, friendly user
> experience, I recommend adding as much scalar functions as possible in
> version 1.4 release. I have filed the JIRAs(
> https://issues.apache.org/jira/browse/FLINK-6810), and try my best to work
> on them.
>
> Of course, welcome anybody to add sub-tasks or take the JIRAs.
>
> Cheers,
> SunJincheng
>
> 2017-06-16 16:07 GMT+08:00 Fabian Hueske <fhueske@gmail.com>:
>
> > Thanks for your response Shaoxuan,
> >
> > My "Table-table join with retraction" is probably the same as your
> > "unbounded stream-stream join with retraction".
> > Basically, a join between two dynamic tables with unique keys (either
> > because of an upsert stream->table conversion or an unbounded
> aggregation).
> >
> > Best, Fabian
> >
> > 2017-06-16 0:56 GMT+02:00 Shaoxuan Wang <wshaoxuan@gmail.com>:
> >
> > > Nice timing, Fabian!
> > >
> > > Your checklist aligns our plans very well. Here are the things we are
> > > working on & planning to contribute to release 1.4:
> > > 1. DDL (with property waterMark config for source-table, and emit
> config
> > on
> > > result-table)
> > > 2. unbounded stream-stream joins (with retraction supported)
> > > 3. backend state user interface for UDAGG
> > > 4. UDOP (as oppose to UDF(scalars to scalar)/UDTF(scalar to
> > > table)/UDAGG(table to scalar), this allows user to define a table to
> > table
> > > conversion business logic)
> > >
> > > Some of them already have PR/jira, while some are not. We will send out
> > the
> > > design doc for the missing ones very soon. Looking forward to the 1.4
> > > release.
> > >
> > > Btw, what is "Table-Table (with retraction)" you have mentioned in your
> > > plan?
> > >
> > > Regards,
> > > Shaoxuan
> > >
> > >
> > >
> > > On Thu, Jun 15, 2017 at 10:29 PM, Fabian Hueske <fhueske@gmail.com>
> > wrote:
> > >
> > > > Hi everybody,
> > > >
> > > > I would like to start a discussion about the targeted feature set of
> > the
> > > > Table API / SQL for Flink 1.4.0.
> > > > Flink 1.3.0 was released about 2 weeks ago and we have 2.5 months
> (~11
> > > > weeks, until begin of September) left until the feature freeze for
> > Flink
> > > > 1.4.0.
> > > >
> > > > I think it makes sense to start with a collection of desired
> features.
> > > Once
> > > > we have a list of requested features, we might want to prioritize and
> > > maybe
> > > > also assign responsibilities.
> > > >
> > > > When we prioritize, we should keep in mind that:
> > > > - we want to have a consistent API. Larger features should be
> developed
> > > in
> > > > a feature branch first.
> > > > - the next months are typical time for vacations
> > > > - we have been bottlenecked by committer resources in the last
> release.
> > > >
> > > > I think the following features would be a nice addition to the
> current
> > > > state:
> > > >
> > > > - Conversion of a stream into an upsert table (with retraction,
> > updating
> > > to
> > > > the last row per key)
> > > > - Joins for streaming tables
> > > >   - Stream-Stream (time-range predicate) there is already a PR for
> > > > processing time joins
> > > >   - Table-Table (with retraction)
> > > > - Support for late arriving records in group window aggregations
> > > > - Exposing a keyed result table as queryable state
> > > >
> > > > Which features are others looking for?
> > > >
> > > > Cheers,
> > > > Fabian
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message