beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tyler Akidau (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-301) Add a Beam SQL DSL
Date Thu, 06 Apr 2017 19:51:41 GMT

    [ https://issues.apache.org/jira/browse/BEAM-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959621#comment-15959621
] 

Tyler Akidau commented on BEAM-301:
-----------------------------------

I agree with what I think Mingmin is saying here in that it's important to clearly distinguish
between streams and tables. Though it's easy to interchange between them, they should not
be treated as 1:1 equivalents. In a situation where you clearly have a source of one type
of the other, that should dictate which type of primitive you start out with.

That said, I agree with Julian that STREAM, if kept around, is probably just an alternative
way of specifying an EMIT clause that emits upon every new INSERT/UPDATE/DELETE. I think EMIT
EARLY is probably too general, though. EMIT CHANGELOG, EMIT INSERTS, or even EMIT STREAM (though
only because of the history of the STREAM keyword) are all little more specific. But we don't
need to bikeshed terms here.

I also agree that we should come to some agreement on a full specification for EMIT before
forging ahead with any implementations (and to be clear, that's independent from the core
SQL DSL stuff you're already putting in the feature branch, Mingmin; that work can proceed
unimpeded while we sort out unified model semantics). In that vein, I've dedicated two chapters
of my upcoming streaming systems book to the topic (one for the necessary background, and
one on SQL specifically) as I've tried to sort the question out for myself. The book won't
be out until later this summer at the earliest, though, so maybe in parallel I should try
to condense that all into a specification doc we could iterate on in public? I'm not saying
what I have there is necessarily the right answer, but it incorporates everything we've more
or less agreed upon so far and then extends it a little further, so I think it's probably
a good place to start from. I has triggering semantics via EMIT, the semantics necessary for
temporal joins, and also addresses things like CUBE in a clean fashion. If we can come to
agreement on a way forward in both the Beam and Calcite camps, then we're probably in a good
position to forge ahead with implementation details.

Sound reasonable?

> Add a Beam SQL DSL
> ------------------
>
>                 Key: BEAM-301
>                 URL: https://issues.apache.org/jira/browse/BEAM-301
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-ideas
>            Reporter: Jean-Baptiste Onofré
>            Assignee: Xu Mingmin
>
> The SQL DSL helps developers to build a Beam pipeline from SQL statement in String directly.

> In Phase I, it starts to support INSERT/SELECT queries with FILTERs, one example SQL
as below:
> {code}
> INSERT INTO `SUB_USEREVENT` (`SITEID`, `PAGEID`, `PAGENAME`, `EVENTTIMESTAMP`)
> (SELECT STREAM `USEREVENT`.`SITEID`, `USEREVENT`.`PAGEID`, `USEREVENT`.`PAGENAME`, `USEREVENT`.`EVENTTIMESTAMP`
> FROM `USEREVENT` AS `USEREVENT`
> WHERE `USEREVENT`.`SITEID` > 10)
> {code}
> A design doc is available at https://docs.google.com/document/d/1Uc5xYTpO9qsLXtT38OfuoqSLimH_0a1Bz5BsCROMzCU/edit?usp=sharing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message