From dev-return-9534-archive-asf-public=cust-asf.ponee.io@beam.apache.org Fri May 4 20:23:21 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id CD7E2180634 for ; Fri, 4 May 2018 20:23:20 +0200 (CEST) Received: (qmail 40808 invoked by uid 500); 4 May 2018 18:23:14 -0000 Mailing-List: contact dev-help@beam.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.apache.org Delivered-To: mailing list dev@beam.apache.org Received: (qmail 40794 invoked by uid 99); 4 May 2018 18:23:14 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 May 2018 18:23:14 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id CFCE4C94C9 for ; Fri, 4 May 2018 18:23:13 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -13.631 X-Spam-Level: X-Spam-Status: No, score=-13.631 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=google.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id hZm3aLjNqZU3 for ; Fri, 4 May 2018 18:23:12 +0000 (UTC) Received: from mail-ot0-f196.google.com (mail-ot0-f196.google.com [74.125.82.196]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 532305F1F0 for ; Fri, 4 May 2018 18:23:12 +0000 (UTC) Received: by mail-ot0-f196.google.com with SMTP id t1-v6so25503286oth.8 for ; Fri, 04 May 2018 11:23:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=AQInj5mghrlua9PBYXJJ73xDhMxbohNmpgRipnZBJtE=; b=P+Wfx1kOfRzSXQ7MKUpxnjqrvJ3GqDFjNCGgwe4yMi3e+5/Z/d7COuHB3hvXY9jUCO JBLTvXV1Ysb6Rw60LVWQh9ng8riRdQDosgtC0vv+iY+ILnIUt/po29ft4O6/O/kkkReU nBxuvDMq5p2/h8EhUUAVuLp+ab3GzOx7yRUYNZ5YMaP4hJXSW6fg0H8zE/T9Ry/u2eRL Pkqc8d63D6ks7Miz2TCnspRU8weQTM0ivnDdZm2cKhUzR0IpTu9EfaD9w2FLDKv9WeSr jdErHnCXbQsOmRmknRcGnCcuFoNVJN7mogHjatMvAjnOiPRJr1cU6VejJmO+pAN0v9jV gdsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=AQInj5mghrlua9PBYXJJ73xDhMxbohNmpgRipnZBJtE=; b=JcaXLf39a/e1hF53uW98ZDsOXmbaU6pcVkJ2WGbHzwytMj5gDpAslrj0VhOFLjLK+5 KmFWTFjBpQyR5rgAho5ZEwyw6IYTbAPSbP8EhILBsrftgG9prmz6d7vJzWN5baXKHWYb rZuUyF501x9DIKNTzyiE/WT1HysTOVttNP1a/92I0ripMv4u4BAANoM9FyB6JGTO7R/M dez4sXlzI25LFjNcA184WxI4vKF/Kr6hKMZFlbB8iRyKKlMl0adKytR4vuQQhggufbJ6 82YLQoGx44eGDRRIecEhpDphVKiwewjavYKRi1RTk1m1bwOZiuoJMxK0yFAxxUfVQ3/p dmxg== X-Gm-Message-State: ALQs6tACYEJXdch7i51b0bpGILiWy1ISTXouRFTJPCLIZLb+g2ETUF+Z dwOuIFhjgaVxQ66k0PCol8Nsckk84r0B/u9GXPg91XBFM7s= X-Google-Smtp-Source: AB8JxZoyrn9BRC7R0Em9ziM3GBqsGdfSZL/SVtk1uGUNxaDXhGLNyqvxZyDW+SuFBuGkRKuyI0Afy+RmnZL/k0a//mo= X-Received: by 2002:a9d:352c:: with SMTP id o41-v6mr1701543otc.368.1525458185562; Fri, 04 May 2018 11:23:05 -0700 (PDT) MIME-Version: 1.0 From: Anton Kedin Date: Fri, 04 May 2018 18:22:55 +0000 Message-ID: Subject: Complex Types Support for Beam SQL DDL To: dev@beam.apache.org Content-Type: multipart/alternative; boundary="00000000000045d622056b6569bf" --00000000000045d622056b6569bf Content-Type: text/plain; charset="UTF-8" Hi, I am working on adding support for non-primitive types in Beam SQL DDL. *Goal* Allow users to define tables with Rows, Arrays, Maps as field types in DDL. This enables defining schemas for complex sources, e.g. describing JSON sources or other sources which support complex field types (BQ, etc). *Solution* Extend the parser we have in Beam SQLto accept the following DDL statement: "CREATE TABLE tableName (field_name )" where "" can be any the following: - "primitiveType ARRAY", for example, "field_int_arr" INTEGER ARRAY". Thoughts: - this is how SQL standard defines ARRAY field declaration; - existing parser supports similar syntax for collections; - hard to read for nested collections; - similar syntax is supported in Postgres ; - "ARRAY", for example "field_matrix ARRAY>". Thoughts: - easy to read and support arbitrary nesting; - similar syntax is implemented in: - BigQuery ; - Spanner ; - KSQL ; - Spark/Hive ; - "MAP", for example "MAP>". Thoughts: - there doesn't seem to be a SQL standard support for maps; - looks similar to the "ARRAY" definition; - similar syntax is implemented in: - KSQL ; - Spark/Hive ; - "ROW(fieldList)", for example "row_field ROW(f_int INTEGER, f_str VARCHAR)". Thoughts: - SQL standard defines the syntax this way; - don't know where similar syntax is implemented; - "ROW", for example "row_field ROW". Thoughts: - ROW is not supported in a lot of dialects, but STRUCT is similar and supported in few dialects; - similar syntax for STRUCT is implemented in: - BigQuery ; - Spark/Hive ; Questions/comments? Pull Request Thank you, Anton --00000000000045d622056b6569bf Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,

I am working on adding support for = non-primitive types in Beam SQL DDL.

Goal
Allow users to define ta= bles with Rows, Arrays, Maps as field types in DDL. This enables defining s= chemas for complex sources, e.g. describing JSON sources or other sources w= hich support complex field types (BQ, etc).

Solution
Extend the pa= rser we have in Beam SQLto accept the following DDL statement:
"CREATE TABLE tabl= eName (field_name <COMPLEX_FIELD_TYPE>)" where "<COMPLEX_FIELD_TYPE>= ;" can be any the following:
  • "primitiveType ARRAY"= , for example, "field_int_arr" INTEGER ARRAY". Thoughts:
  • this is how SQL standard defines ARRAY field declaration;
  • existing parser s= upports similar syntax for collections;
  • hard to read for nested col= lections;
  • similar syntax is supported in Postgres;
  • "ARRAY<type>", for example "field_matrix ARRAY<ARRAY<INTEGER>>". Thought= s:
    • easy to read and support arbitrary nesting;
    • simi= lar syntax is implemented in:=C2=A0
    • "MAP<p= rimitiveType, type>", for example "MAP<VARCHAR, MAP<INTEGER, VARCHAR&g= t;>". Thoughts:
      • there doesn't seem to be= a SQL standard support for maps;
      • looks similar to the "ARRAY<type>" definition;
      • similar syntax is implemented in:
    • "ROW(fieldList)", for example "row_field ROW(f_int INTEGER, f_str VARCH= AR)". Thoughts:
      • SQL standard defines the syntax= this way;
      • don't know where similar syntax is implemented;
      • =
    • "ROW<= ;fieldList>", for example "row_field ROW<f_int INTEGER, f_str VARCHAR>= ". Thoughts:
      • ROW is not supported in a lot of dialects, but <= font face=3D"monospace, monospace" color=3D"#38761d">STRUCT is simil= ar and supported in few dialects;
      • similar syntax for STRUCT is implemented in:
    Questions/comments?
    Pull Request

    Thank you,
    Anton

    <= br>
    --00000000000045d622056b6569bf--