apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amol Kekre <a...@datatorrent.com>
Subject Re: SQL as specification for Apex DAGs
Date Tue, 10 May 2016 00:46:36 GMT
Yes +1 on Calcite. Lets re-use

Thks
Amol


On Mon, May 9, 2016 at 2:40 PM, Thomas Weise <thomas@datatorrent.com> wrote:

> The point was that Calcite is already solving this. There is no need to
> define yet another language.
>
> --
> sent from mobile
> On May 9, 2016 2:37 PM, "David Yan" <david@datatorrent.com> wrote:
>
> > In the context of supporting a high level API and supporting Apache
> Beam, I
> > think we should also think about a possible SQL-like language that can
> > specify the stream processing with the concepts of session windowing,
> > watermarking, triggering, etc.
> > Taking from the this article at
> > https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102:
> >
> > PCollection<KV<String, Integer>> scores = input
> >   .apply(Window.into(FixedWindows.of(Duration.standardMinutes(2)))
> >                .triggering(
> >                  AtWatermark()
> >
> .withEarlyFirings(AtPeriod(Duration.standardMinutes(1)))
> >                    .withLateFirings(AtCount(1)))
> >                .withAllowedLateness(Duration.standardMinutes(1)))
> >   .apply(Sum.integersPerKey());
> >
> > We might want to design a SQL-like language users can write on the fly
> > without having to compile java code. Something like:
> >
> > SELECT SUM(value) FROM input INTO output WHERE <filter condition> GROUP
> BY
> > key FIXED WINDOWS INTERVAL '2' MINUTE AT WATERMARK WITH EARLY FIRING AT
> > INTERVAL '1' MINUTE WITH LATE FIRINGS AT COUNT '1' WITH ALLOWED LATENESS
> > INTERVAL '1' MINUTE
> >
> > We can also allow users to implement and register custom "SQL functions"
> > this way to allow custom processing.
> >
> > David
> >
> >
> >
> > On Mon, May 9, 2016 at 11:59 AM, Thomas Weise <thomas@datatorrent.com>
> > wrote:
> >
> > > I just attended the talk by Julian Hyde at Apache Big Data. The
> > > presentation was similar to the one at the Hadoop Summit here:
> > >
> > > http://www.slideshare.net/julianhyde/querying-the-internet
> > > -of-things-streaming-sql-on-kafkasamza-and-stormtrident
> > >
> > > We discussed SQL at various occasions, Calcite integration seems the
> way
> > to
> > > go. My current idea is that a SQL query can be applied on top of a base
> > > application or used to define a SubDAG, and introduce additional
> > operators.
> > > It also plays into the interactive query capability (app data).
> > >
> > > There is an existing JIRA and maybe it is time to get to the next level
> > of
> > > discussion:
> > >
> > > https://issues.apache.org/jira/browse/APEXMALHAR-1818
> > >
> > > Julian has expressed interest to work with us on this.
> > >
> > > Thanks,
> > > Thomas
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message