beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julian Hyde (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-301) Add a Beam SQL DSL
Date Fri, 14 Apr 2017 00:52:41 GMT

    [ https://issues.apache.org/jira/browse/BEAM-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968460#comment-15968460
] 

Julian Hyde commented on BEAM-301:
----------------------------------

[~takidau], +1 to that approach.

In the short term I think we will have multiple variants of streaming SQL. Calcite will support
STREAM and using the same identifier for streams/tables, and will provide a [switch|https://calcite.apache.org/apidocs/org/apache/calcite/sql/validate/SqlConformance.html]
so that Beam can disable them. Over the longer term I will try make the case that these features
are useful (or I will fail, and these features will wither away). I can't really make the
case until we have features like self-join of a stream to its own history.

The crucial query which illustrates this is this one:

{code}select stream *
from Orders as o
where units > (
  select avg(units)
  from Orders as h
  where h.productId = o.productId
  and h.rowtime > o.rowtime - interval ‘1’ year){code}

It combines the {{Orders}} stream with its own history. But after the query has been running
for a while, the records that passed through the stream will have entered the history. The
history relation {{h}} is neither bounded, nor unbounded (in Beam's terms), but time-varying.

> Add a Beam SQL DSL
> ------------------
>
>                 Key: BEAM-301
>                 URL: https://issues.apache.org/jira/browse/BEAM-301
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-ideas
>            Reporter: Jean-Baptiste Onofré
>            Assignee: Xu Mingmin
>
> The SQL DSL helps developers to build a Beam pipeline from SQL statement in String directly.

> In Phase I, it starts to support INSERT/SELECT queries with FILTERs, one example SQL
as below:
> {code}
> INSERT INTO `SUB_USEREVENT` (`SITEID`, `PAGEID`, `PAGENAME`, `EVENTTIMESTAMP`)
> (SELECT STREAM `USEREVENT`.`SITEID`, `USEREVENT`.`PAGEID`, `USEREVENT`.`PAGENAME`, `USEREVENT`.`EVENTTIMESTAMP`
> FROM `USEREVENT` AS `USEREVENT`
> WHERE `USEREVENT`.`SITEID` > 10)
> {code}
> A design doc is available at https://docs.google.com/document/d/1Uc5xYTpO9qsLXtT38OfuoqSLimH_0a1Bz5BsCROMzCU/edit?usp=sharing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message