asterixdb-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From buyin...@apache.org
Subject asterixdb git commit: Address Don's comments for the query section doc.
Date Wed, 16 Nov 2016 17:47:21 GMT
Repository: asterixdb
Updated Branches:
  refs/heads/master d7b3ddf8e -> 42467cb11


Address Don's comments for the query section doc.

Change-Id: I30ab472d87221d6b1c709f24c4dba1bdfaa2c0b4
Reviewed-on: https://asterix-gerrit.ics.uci.edu/1329
Sonar-Qube: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Integration-Tests: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Reviewed-by: Till Westmann <tillw@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/asterixdb/repo
Commit: http://git-wip-us.apache.org/repos/asf/asterixdb/commit/42467cb1
Tree: http://git-wip-us.apache.org/repos/asf/asterixdb/tree/42467cb1
Diff: http://git-wip-us.apache.org/repos/asf/asterixdb/diff/42467cb1

Branch: refs/heads/master
Commit: 42467cb11bd2ed1d2cd2126cf4ede47fbd55132e
Parents: d7b3ddf
Author: Yingyi Bu <yingyi@couchbase.com>
Authored: Tue Nov 1 10:19:57 2016 -0700
Committer: Yingyi Bu <buyingyi@gmail.com>
Committed: Wed Nov 16 09:46:12 2016 -0800

----------------------------------------------------------------------
 .../src/main/markdown/sqlpp/3_query.md          | 190 ++++++++++++++-----
 1 file changed, 142 insertions(+), 48 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/asterixdb/blob/42467cb1/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md
----------------------------------------------------------------------
diff --git a/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md b/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md
index 85787e8..dc760ba 100644
--- a/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md
+++ b/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md
@@ -33,13 +33,13 @@ The following shows the (rich) grammar for the `SELECT` statement in SQL++.
     Subquery           ::= "(" SelectStatement ")"
 
     SelectBlock        ::= SelectClause
-                           ( FromClause ( WithClause )?)?
+                           ( FromClause ( LetClause )?)?
                            ( WhereClause )?
                            ( GroupbyClause ( LetClause )? ( HavingClause )? )?
                            |
-                           FromClause ( WithClause )?
+                           FromClause ( LetClause )?
                            ( WhereClause )?
-                           ( GroupbyClause ( WithClause )? ( HavingClause )? )?
+                           ( GroupbyClause ( LetClause )? ( HavingClause )? )?
                            SelectClause
 
     SelectClause       ::= <SELECT> ( <ALL> | <DISTINCT> )? ( SelectRegular
| SelectValue )
@@ -48,10 +48,10 @@ The following shows the (rich) grammar for the `SELECT` statement in SQL++.
     Projection         ::= ( Expression ( <AS> )? Identifier | "*" )
 
     FromClause         ::= <FROM> FromTerm ( "," FromTerm )*
-    FromTerm           ::= Expression (( <AS> )? Variable)? ( <AT> Variable )?
+    FromTerm           ::= Expression (( <AS> )? Variable)?
                            ( ( JoinType )? ( JoinClause | UnnestClause ) )*
 
-    JoinClause         ::= <JOIN> Expression (( <AS> )? Variable)? (<AT>
Variable)? <ON> Expression
+    JoinClause         ::= <JOIN> Expression (( <AS> )? Variable)? <ON>
Expression
     UnnestClause       ::= ( <UNNEST> | <CORRELATE> | <FLATTEN> ) Expression
                            ( <AS> )? Variable ( <AT> Variable )?
     JoinType           ::= ( <INNER> | <LEFT> ( <OUTER> )? )
@@ -74,13 +74,13 @@ The following shows the (rich) grammar for the `SELECT` statement in SQL++.
 
 In this section, we will make use of two stored collections of objects (datasets), `GleambookUsers`
and `GleambookMessages`, in a series of running examples to explain `SELECT` queries. The
contents of the example collections are as follows:
 
-`GleambookUsers` collection:
+`GleambookUsers` collection (or, dataset):
 
     {"id":1,"alias":"Margarita","name":"MargaritaStoddard","nickname":"Mags","userSince":"2012-08-20T10:10:00","friendIds":[2,3,6,10],"employment":[{"organizationName":"Codetechno","start-date":"2006-08-06"},{"organizationName":"geomedia","start-date":"2010-06-17","end-date":"2010-01-26"}],"gender":"F"}
     {"id":2,"alias":"Isbel","name":"IsbelDull","nickname":"Izzy","userSince":"2011-01-22T10:10:00","friendIds":[1,4],"employment":[{"organizationName":"Hexviafind","startDate":"2010-04-27"}]}
     {"id":3,"alias":"Emory","name":"EmoryUnk","userSince":"2012-07-10T10:10:00","friendIds":[1,5,8,9],"employment":[{"organizationName":"geomedia","startDate":"2010-06-17","endDate":"2010-01-26"}]}
 
-`GleambookMessages` collection:
+`GleambookMessages` collection (or, dataset):
 
     {"messageId":2,"authorId":1,"inResponseTo":4,"senderLocation":[41.66,80.87],"message":"
dislike iphone its touch-screen is horrible"}
     {"messageId":3,"authorId":2,"inResponseTo":4,"senderLocation":[48.09,81.01],"message":"
like samsung the plan is amazing"}
@@ -136,6 +136,7 @@ This query returns:
 In SQL++, the traditional SQL-style `SELECT` syntax is also supported.
 This syntax can also be reformulated in a `SELECT VALUE` based manner in SQL++.
 (E.g., `SELECT expA AS fldA, expB AS fldB` is syntactic sugar for `SELECT VALUE { 'fldA':
expA, 'fldB': expB }`.)
+Unlike in SQL, the result of an SQL++ query does not preserve the order of expressions in
the `SELECT` clause.
 
 ##### Example
     SELECT user.alias user_alias, user.name user_name
@@ -150,7 +151,9 @@ Returns:
     } ]
 
 ### <a id="Select_star">SELECT *</a>
-In SQL++, `SELECT *` returns a object with a nested field for each input tuple. Each field
has as its field name the name of a binding variable generated by either the `FROM` clause
or `GROUP BY` clause in the current enclosing `SELECT` statement, and its field is the value
of that binding variable.
+In SQL++, `SELECT *` returns a object with a nested field for each input tuple.
+Each field has as its field name the name of a binding variable generated by either the `FROM`
clause or `GROUP BY`
+clause in the current enclosing `SELECT` statement, and its field value is the value of that
binding variable.
 
 ##### Example
 
@@ -225,6 +228,75 @@ Since `user` is the only binding variable generated in the `FROM` clause,
this q
         }
     } ]
 
+
+##### Example
+
+    SELECT *
+    FROM GleambookUsers u, GleambookMessages m
+    WHERE m.authorId = u.id and u.id = 2;
+
+This query does an inner join that we will discuss in [multiple from terms](#Multiple_from_terms).
+Since both `u` and `m` are binding variable generated in the `FROM` clause, this query returns:
+
+    [ {
+        "u": {
+            "userSince": "2011-01-22T10:10:00",
+            "friendIds": [
+                1,
+                4
+            ],
+            "name": "IsbelDull",
+            "nickname": "Izzy",
+            "alias": "Isbel",
+            "id": 2,
+            "employment": [
+                {
+                    "organizationName": "Hexviafind",
+                    "startDate": "2010-04-27"
+                }
+            ]
+        },
+        "m": {
+            "senderLocation": [
+                31.5,
+                75.56
+            ],
+            "inResponseTo": 1,
+            "messageId": 6,
+            "authorId": 2,
+            "message": " like t-mobile its platform is mind-blowing"
+        }
+    }, {
+        "u": {
+            "userSince": "2011-01-22T10:10:00",
+            "friendIds": [
+                1,
+                4
+            ],
+            "name": "IsbelDull",
+            "nickname": "Izzy",
+            "alias": "Isbel",
+            "id": 2,
+            "employment": [
+                {
+                    "organizationName": "Hexviafind",
+                    "startDate": "2010-04-27"
+                }
+            ]
+        },
+        "m": {
+            "senderLocation": [
+                48.09,
+                81.01
+            ],
+            "inResponseTo": 4,
+            "messageId": 3,
+            "authorId": 2,
+            "message": " like samsung the plan is amazing"
+        }
+    } ]
+
+
 ### <a id="Select_distinct">SELECT DISTINCT</a>
 SQL++'s `DISTINCT` keyword is used to eliminate duplicate items in results. The following
example shows how it works.
 
@@ -388,6 +460,7 @@ A `FROM` clause is used for enumerating (i.e., conceptually iterating
over) the
 
 ### <a id="Binding_expressions">Binding expressions</a>
 In SQL++, in addition to stored collections, a `FROM` clause can iterate over any intermediate
collection returned by a valid SQL++ expression.
+In the tuple stream generated by a `FROM` clause, the ordering of the input tuples are not
guaranteed to be preserved.
 
 ##### Example
 
@@ -437,7 +510,8 @@ Note that the first alternative is one of the SQL-92 approaches to expressing
a
 Similar to standard SQL, SQL++ supports implicit `FROM` binding variables (i.e., aliases),
for which a binding variable is generated. SQL++ variable generation falls into three cases:
 
   * If the binding expression is a variable reference expression, the generated variable's
name will be the name of the referenced variable itself.
-  * If the binding expression is a field access expression, the generated variable's name
will be the last identifier in the expression.
+  * If the binding expression is a field access expression (or a fully qualified name for
a dataset), the generated
+    variable's name will be the last identifier (or the dataset name) in the expression.
   * For all other cases, a compilation error will be raised.
 
 The next two examples show queries that do not provide binding variables in their `FROM`
clauses.
@@ -551,7 +625,7 @@ The SQL++ `GROUP BY` clause generalizes standard SQL's grouping and aggregation
 
 ### <a id="Group_variables">Group variables</a>
 In a `GROUP BY` clause, in addition to the binding variable(s) defined for the grouping key(s),
SQL++ allows a user to define a *group variable* by using the clause's `GROUP AS` extension
to denote the resulting group.
-After grouping, then, the query's in-scope variables include the grouping key's binding variables
as well as this group variable which will be bound to one collection value for each group.
This per-group collection value will be a set of nested objects in which each field of the
object is the result of a renamed variable defined in parentheses following the group variable's
name. The `GROUP AS` syntax is as follows:
+After grouping, then, the query's in-scope variables include the grouping key's binding variables
as well as this group variable which will be bound to one collection value for each group.
This per-group collection (i.e., multiset) value will be a set of nested objects in which
each field of the object is the result of a renamed variable defined in parentheses following
the group variable's name. The `GROUP AS` syntax is as follows:
 
     <GROUP> <AS> Variable ("(" Variable <AS> VariableReference ("," Variable
<AS> VariableReference )* ")")?
 
@@ -839,13 +913,13 @@ The next example illustrates a query that doesn't provide binding variables
for
 ##### Example
 
     SELECT authorId,
-           (SELECT VALUE m.msg
-            FROM msgs m
-            WHERE m.msg.message LIKE '% like%'
-            ORDER BY m.msg.messageId
-            LIMIT 2) AS msgs
-    FROM GleambookMessages message
-    GROUP BY message.authorId GROUP AS msgs(message AS msg);
+       (SELECT VALUE g.msg
+        FROM g
+        WHERE g.msg.message LIKE '% like%'
+        ORDER BY g.msg.messageId
+        LIMIT 2) AS msgs
+    FROM GleambookMessages gbm
+    GROUP BY gbm.authorId GROUP AS g(gbm as msg);
 
 This query returns:
 
@@ -936,13 +1010,13 @@ However, the query above is a syntactically-sugared simplification
of the follow
 legal, executable, and returns the same result:
 
     SELECT uid,
-           (SELECT m.message
-            FROM (SELECT VALUE grp.message FROM `$1` AS grp) AS m
-            WHERE m.message LIKE '% like%'
-            ORDER BY m.messageId
-            LIMIT 2) AS msgs
-    FROM GleambookMessages message
-    GROUP BY message.authorId AS uid GROUP AS `$1` (message AS message);
+       (SELECT g.msg.message
+        FROM g
+        WHERE g.msg.message LIKE '% like%'
+        ORDER BY g.msg.messageId
+        LIMIT 2) AS msgs
+    FROM GleambookMessages gbm
+    GROUP BY gbm.authorId AS uid GROUP AS g(gbm as msg);
 
 ### <a id="Aggregation_functions">Aggregation functions</a>
 In traditional SQL, which doesn't support nested data, grouping always also involves the
use of aggregation
@@ -977,7 +1051,7 @@ handles them in the ad hoc "just ignore the unknown values" fashion that
the SQL
 
     ARRAY_AVG(
         (
-          SELECT VALUE len(friendIds) FROM GleambookUsers
+          SELECT VALUE ARRAY_COUNT(friendIds) FROM GleambookUsers
         )
     );
 
@@ -1015,7 +1089,7 @@ to compute a result that is identical to that of the more explicit SQL++
example
 
 ##### Example
 
-    SELECT uid, COUNT(msg) AS msgCnt
+    SELECT uid, COUNT(*) AS msgCnt
     FROM GleambookMessages msg
     GROUP BY msg.authorId AS uid;
 
@@ -1023,7 +1097,7 @@ It is important to realize that `COUNT` is actually **not** a SQL++
built-in agg
 Rather, the `COUNT` query above is using a special "sugared" function symbol that the SQL++
compiler
 will rewrite as follows:
 
-    SELECT uid AS uid, ARRAY_COUNT( (SELECT g.msg FROM `$1` as g) ) AS msgCnt
+    SELECT uid AS uid, ARRAY_COUNT( (SELECT VALUE 1 FROM `$1` as g) ) AS msgCnt
     FROM GleambookMessages msg
     GROUP BY msg.authorId AS uid GROUP AS `$1`(msg AS msg);
 
@@ -1090,13 +1164,13 @@ Note that if the condition expression evaluates to `NULL` or `MISSING`
the input
 The `ORDER BY` clause is used to globally sort data in either ascending order (i.e., `ASC`)
or descending order (i.e., `DESC`).
 During ordering, `MISSING` and `NULL` are treated as being smaller than any other value if
they are encountered
 in the ordering key(s). `MISSING` is treated as smaller than `NULL` if both occur in the
data being sorted.
-The following example returns all `GleambookUsers` ordered by their friend numbers.
+The following example returns all `GleambookUsers` in descending order by their number of
friends.
 
 ##### Example
 
       SELECT VALUE user
       FROM GleambookUsers AS user
-      ORDER BY len(user.friendIds) DESC;
+      ORDER BY ARRAY_COUNT(user.friendIds) DESC;
 
 This query returns:
 
@@ -1206,12 +1280,12 @@ The next query shows an example.
 ##### Example
 
     WITH avgFriendCount AS (
-      SELECT VALUE AVG(LEN(user.friendIds))
+      SELECT VALUE AVG(ARRAY_COUNT(user.friendIds))
       FROM GleambookUsers AS user
     )[0]
     SELECT VALUE user
     FROM GleambookUsers user
-    WHERE LEN(user.friendIds) > avgFriendCount;
+    WHERE ARRAY_COUNT(user.friendIds) > avgFriendCount;
 
 This query returns:
 
@@ -1263,8 +1337,8 @@ The query is equivalent to the following, more complex, inlined form
of the quer
 
     SELECT *
     FROM GleambookUsers user
-    WHERE LEN(user.friendIds) >
-        ( SELECT VALUE AVG(LEN(user.friendIds))
+    WHERE ARRAY_COUNT(user.friendIds) >
+        ( SELECT VALUE AVG(ARRAY_COUNT(user.friendIds))
           FROM GleambookUsers AS user
         ) [0];
 
@@ -1389,7 +1463,11 @@ This query is equivalent to the following query that does not use the
`LET` clau
                  );
 
 ## <a id="Union_all">UNION ALL</a>
-UNION ALL can be used to combine two input streams into one. As in SQL, there is no ordering
guarantee on the contents of the output stream. However, unlike SQL, SQL++ does not constrain
what the data looks like on the input streams; in particular, it allows heterogenity on the
input and output streams. The following odd but legal query is an example:
+UNION ALL can be used to combine two input streams into one. As in SQL, there is no ordering
guarantee on the contents of the output stream.
+However, unlike SQL, SQL++ does not constrain what the data looks like on the input streams;
in particular,
+it allows heterogenity on the input and output streams.
+A type error will be raised if one of the inputs is not a collection.
+The following odd but legal query is an example:
 
 ##### Example
 
@@ -1456,20 +1534,7 @@ Note that a subquery, like a top-level `SELECT` statment, always returns
a colle
 within a query the subquery occurs -- and again, its result is never automatically cast into
a scalar.
 
 ## <a id="Vs_SQL-92">SQL++ vs. SQL-92</a>
-The following matrix is a quick "SQL-92 compatibility cheat sheet" for SQL++.
-
-| Feature |  SQL++ | SQL-92 |
-|----------|--------|--------|
-| SELECT * | Returns nested objects | Returns flattened concatenated objects |
-| Subquery | Returns a collection  | The returned collection is cast into a scalar value
if the subquery appears in a SELECT list or on one side of a comparison or as input to a function
|
-| LEFT OUTER JOIN |  Fills in `MISSING`(s) for non-matches  |   Fills in `NULL`(s) for non-matches
   |
-| UNION ALL       | Allows heterogeneous inputs and output | Input streams must be UNION-compatible
and output field names are drawn from the first input stream
-| IN constant_expr | The constant expression has to be an array or multiset, i.e., [..,..,...]
| The constant collection can be represented as comma-separated items in a paren pair |
-| String literal | Double quotes or single quotes | Single quotes only |
-| Delimited identifiers | Backticks | Double quotes |
-
-For things beyond this cheat sheet, SQL++ is SQL-92 compliant.
-Morever, SQL++ offers the following additional features beyond SQL-92 (hence the "++" in
its name):
+SQL++ offers the following additional features beyond SQL-92 (hence the "++" in its name):
 
   * Fully composable and functional: A subquery can iterate over any intermediate collection
and can appear anywhere in a query.
   * Schema-free: The query language does not assume the existence of a static schema for
any data that it processes.
@@ -1477,3 +1542,32 @@ Morever, SQL++ offers the following additional features beyond SQL-92
(hence the
   * Powerful GROUP BY: In addition to a set of aggregate functions as in standard SQL, the
groups created by the `GROUP BY` clause are directly usable in nested queries and/or to obtain
nested results.
   * Generalized SELECT clause: A SELECT clause can return any type of collection, while in
SQL-92, a `SELECT` clause has to return a (homogeneous) collection of objects.
 
+
+The following matrix is a quick "SQL-92 compatibility cheat sheet" for SQL++.
+
+| Feature |  SQL++ | SQL-92 |  Why different?  |
+|----------|--------|-------|------------------|
+| SELECT * | Returns nested objects | Returns flattened concatenated objects | Nested collections
are 1st class citizens |
+| Subquery | Returns a collection  | The returned collection is cast into a scalar value
if the subquery appears in a SELECT list or on one side of a comparison or as input to a function
| Nested collections are 1st class citizens |
+| LEFT OUTER JOIN |  Fills in `MISSING`(s) for non-matches  |   Fills in `NULL`(s) for non-matches
   | "Absence" is more appropriate than "unknown" here.  |
+| UNION ALL       | Allows heterogeneous inputs and output | Input streams must be UNION-compatible
and output field names are drawn from the first input stream | Heterogenity and nested collections
are common |
+| IN constant_expr | The constant expression has to be an array or multiset, i.e., [..,..,...]
| The constant collection can be represented as comma-separated items in a paren pair | Nested
collections are 1st class citizens |
+| String literal | Double quotes or single quotes | Single quotes only | Double quoted strings
are pervasive |
+| Delimited identifiers | Backticks | Double quotes | Double quoted strings are pervasive
|
+
+The following SQL-92 features are not implemented yet. However, SQL++ does not conflict those
features:
+
+  * CROSS JOIN, NATURAL JOIN, UNION JOIN
+  * RIGHT and FULL OUTER JOIN
+  * INTERSECT, EXCEPT, UNION with set semantics
+  * CAST expression
+  * NULLIF expression
+  * COALESCE expression
+  * ALL and SOME predicates for linking to subqueries
+  * UNIQUE predicate (tests a collection for duplicates)
+  * MATCH predicate (tests for referential integrity)
+  * Row and Table constructors
+  * DISTINCT aggregates
+  * Preserved order for expressions in a SELECT list
+
+


Mime
View raw message