flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-6442) Extend TableAPI Support Sink Table Registration and ‘insert into’ Clause in SQL
Date Fri, 14 Jul 2017 13:56:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-6442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087343#comment-16087343
] 

ASF GitHub Bot commented on FLINK-6442:
---------------------------------------

Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3829#discussion_r127456022
  
    --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/api/TableEnvironment.scala
---
    @@ -488,19 +499,50 @@ abstract class TableEnvironment(val config: TableConfig) {
         *   tEnv.sql(s"SELECT * FROM $table")
         * }}}
         *
    -    * @param query The SQL query to evaluate.
    -    * @return The result of the query as Table.
    +    * @param sql The SQL string to evaluate.
    +    * @return The result of the query as Table or null of the DML insert operation.
         */
    -  def sql(query: String): Table = {
    +  def sql(sql: String): Table = {
         val planner = new FlinkPlannerImpl(getFrameworkConfig, getPlanner, getTypeFactory)
         // parse the sql query
    -    val parsed = planner.parse(query)
    +    val parsed = planner.parse(sql)
         // validate the sql query
         val validated = planner.validate(parsed)
    -    // transform to a relational tree
    -    val relational = planner.rel(validated)
     
    -    new Table(this, LogicalRelNode(relational.rel))
    +    // separate source translation if DML insert, considering sink operation has no output
but all
    +    // RelNodes' translation should return a DataSet/DataStream, and Calcite's TableModify
    +    // defines output for DML(INSERT,DELETE,UPDATE) is a RowCount column of BigInt type.
This a
    +    // work around since we have no conclusion for the output type definition of sink
operation
    +    // (compare to traditional SQL insert).
    +    if (parsed.isInstanceOf[SqlInsert]) {
    +      val insert = parsed.asInstanceOf[SqlInsert]
    +      // validate sink table
    +      val targetName = insert.getTargetTable.asInstanceOf[SqlIdentifier].names.get(0)
    +      val targetTable = getTable(targetName)
    +      if (null == targetTable || !targetTable.isInstanceOf[TableSinkTable[_]]) {
    +        throw new TableException("SQL DML INSERT operation need a registered TableSink
Table!")
    +      }
    +      // validate unsupported partial insertion to sink table
    +      val sinkTable = targetTable.asInstanceOf[TableSinkTable[_]]
    +      if (null != insert.getTargetColumnList && insert.getTargetColumnList.size()
!=
    +        sinkTable.fieldTypes.length) {
    +        throw new TableException(
    +          "SQL DML INSERT do not support insert partial columns of the target table due
to table " +
    +            "columns haven’t nullable property definition for now!")
    +      }
    +
    +      writeToSink(
    +        new Table(this, LogicalRelNode(planner.rel(insert.getSource).rel)),
    +        sinkTable.tableSink,
    +        QueryConfig.getQueryConfigFromTableEnv(this)
    --- End diff --
    
    This will return the default configuration. If we move `INSERT INTO` queries into a special
method (like `sqlInsert()`), the method could have an additional `QueryConfig` parameter.
    
    Otherwise, users won't be able to tweak the execution of a streaming query.


> Extend TableAPI Support Sink Table Registration and ‘insert into’ Clause in SQL
> -------------------------------------------------------------------------------
>
>                 Key: FLINK-6442
>                 URL: https://issues.apache.org/jira/browse/FLINK-6442
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API & SQL
>            Reporter: lincoln.lee
>            Assignee: lincoln.lee
>            Priority: Minor
>
> Currently in TableAPI  there’s only registration method for source table,  when we
use SQL writing a streaming job, we should add additional part for the sink, like TableAPI
does:
> {code}
> val sqlQuery = "SELECT * FROM MyTable WHERE _1 = 3"
> val t = StreamTestData.getSmall3TupleDataStream(env)
> tEnv.registerDataStream("MyTable", t)
> // one way: invoke tableAPI’s writeToSink method directly
> val result = tEnv.sql(sqlQuery)
> result.writeToSink(new YourStreamSink)
> // another way: convert to datastream first and then invoke addSink 
> val result = tEnv.sql(sqlQuery).toDataStream[Row]
> result.addSink(new StreamITCase.StringSink)
> {code}
> From the api we can see the sink table always be a derived table because its 'schema'
is inferred from the result type of upstream query.
> Compare to traditional RDBMS which support DML syntax, a query with a target output could
be written like this:
> {code}
> insert into table target_table_name
> [(column_name [ ,...n ])]
> query
> {code}
> The equivalent form of the example above is as follows:
> {code}
>     tEnv.registerTableSink("targetTable", new YourSink)
>     val sql = "INSERT INTO targetTable SELECT a, b, c FROM sourceTable"
>     val result = tEnv.sql(sql)
> {code}
> It is supported by Calcite’s grammar: 
> {code}
>  insert:( INSERT | UPSERT ) INTO tablePrimary
>  [ '(' column [, column ]* ')' ]
>  query
> {code}
> I'd like to extend Flink TableAPI to support such feature.  see design doc: https://goo.gl/n3phK5



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message