asterixdb-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From buyin...@apache.org
Subject asterixdb git commit: Making the SQL++ reference manual a bit more generic in how it reads.
Date Mon, 03 Oct 2016 19:00:51 GMT
Repository: asterixdb
Updated Branches:
  refs/heads/master f7f3a7f2b -> c7a8a1505


Making the SQL++ reference manual a bit more generic in how it reads.

Change-Id: I184ede1398de3190b60bec2947d826bdc5278594
Reviewed-on: https://asterix-gerrit.ics.uci.edu/1237
Sonar-Qube: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Reviewed-by: Yingyi Bu <buyingyi@gmail.com>


Project: http://git-wip-us.apache.org/repos/asf/asterixdb/repo
Commit: http://git-wip-us.apache.org/repos/asf/asterixdb/commit/c7a8a150
Tree: http://git-wip-us.apache.org/repos/asf/asterixdb/tree/c7a8a150
Diff: http://git-wip-us.apache.org/repos/asf/asterixdb/diff/c7a8a150

Branch: refs/heads/master
Commit: c7a8a15056212367680dff2b133d4a025a5d7a3b
Parents: f7f3a7f
Author: Mike Carey <dtabass@gmail.com>
Authored: Sun Oct 2 22:34:05 2016 -0700
Committer: Yingyi Bu <buyingyi@gmail.com>
Committed: Mon Oct 3 12:00:24 2016 -0700

----------------------------------------------------------------------
 .../src/main/markdown/sqlpp/1_intro.md          | 19 ++++++++--
 .../src/main/markdown/sqlpp/2_expr.md           | 18 +++++-----
 .../src/main/markdown/sqlpp/3_query.md          |  2 +-
 .../src/main/markdown/sqlpp/4_ddl.md            | 38 ++++++++++----------
 4 files changed, 46 insertions(+), 31 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c7a8a150/asterixdb/asterix-doc/src/main/markdown/sqlpp/1_intro.md
----------------------------------------------------------------------
diff --git a/asterixdb/asterix-doc/src/main/markdown/sqlpp/1_intro.md b/asterixdb/asterix-doc/src/main/markdown/sqlpp/1_intro.md
index 808d713..fdc04cb 100644
--- a/asterixdb/asterix-doc/src/main/markdown/sqlpp/1_intro.md
+++ b/asterixdb/asterix-doc/src/main/markdown/sqlpp/1_intro.md
@@ -19,7 +19,22 @@
 
 # <a id="Introduction">1. Introduction</a><font size="3"/>
 
-This document is intended as a reference guide to the full syntax and semantics of the SQL++
Query Language, a SQL-inspired language for working with semistructured data. SQL++ has much
in common with SQL, but there are also differences due to the data model that the language
is designed to serve. (SQL was designed in the 1970's for interacting with the flat, schema-ified
world of relational databases, while SQL++ is designed for the nested, schema-less/schema-optional
world of modern NoSQL systems.) In particular, SQL++ in the context of Apache AsterixDB is
intended for working with the Asterix Data Model (ADM), which is a data model aimed at a superset
of JSON with an enriched and flexible type system.
+This document is intended as a reference guide to the full syntax and semantics of
+the SQL++ Query Language, a SQL-inspired language for working with semistructured data.
+SQL++ has much in common with SQL, but some differences do exist due to the different
+data models that the two languages were designed to serve.
+SQL was designed in the 1970's for interacting with the flat, schema-ified world of
+relational databases, while SQL++ is much newer and targets the nested, schema-optional
+(or even schema-less) world of modern NoSQL systems.
 
-New AsterixDB users are encouraged to read and work through the (friendlier) guide "AsterixDB
101: An ADM and SQL++ Primer" before attempting to make use of this document. In addition,
readers are advised to read and understand the Asterix Data Model (ADM) reference guide since
a basic understanding of ADM concepts is a prerequisite to understanding SQL++. In what follows,
we detail the features of the SQL++ language in a grammar-guided manner: we list and briefly
explain each of the productions in the SQL++ grammar, offering examples (and results) for
clarity.
+In the context of Apache AsterixDB, SQL++ is intended for working with the Asterix Data Model
(ADM),
+a data model based on a superset of JSON with an enriched and flexible type system.
+New AsterixDB users are encouraged to read and work through the (much friendlier) guide
+"AsterixDB 101: An ADM and SQL++ Primer" before attempting to make use of this document.
+In addition, readers are advised to read through the Asterix Data Model (ADM) reference guide
+first as well, as an understanding of the data model is a prerequisite to understanding SQL++.
+
+In what follows, we detail the features of the SQL++ language in a grammar-guided manner.
+We list and briefly explain each of the productions in the SQL++ grammar, offering examples
+(and results) for clarity.
 

http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c7a8a150/asterixdb/asterix-doc/src/main/markdown/sqlpp/2_expr.md
----------------------------------------------------------------------
diff --git a/asterixdb/asterix-doc/src/main/markdown/sqlpp/2_expr.md b/asterixdb/asterix-doc/src/main/markdown/sqlpp/2_expr.md
index c2bab77..732daa4 100644
--- a/asterixdb/asterix-doc/src/main/markdown/sqlpp/2_expr.md
+++ b/asterixdb/asterix-doc/src/main/markdown/sqlpp/2_expr.md
@@ -21,7 +21,7 @@
 
     Expression ::= OperatorExpression | CaseExpression | QuantifiedExpression
 
-SQL++ is a highly composable expression language. Each SQL++ expression returns zero or more
Asterix Data Model (ADM) instances. There are three major kinds of expressions in SQL++. At
the topmost level, a SQL++ expression can be an OperatorExpression (similar to a mathematical
expression), an ConditionalExpression (to choose between alternative values), or a QuantifiedExpression
(which yields a boolean value). Each will be detailed as we explore the full SQL++ grammar.
+SQL++ is a highly composable expression language. Each SQL++ expression returns zero or more
data model instances. There are three major kinds of expressions in SQL++. At the topmost
level, a SQL++ expression can be an OperatorExpression (similar to a mathematical expression),
an ConditionalExpression (to choose between alternative values), or a QuantifiedExpression
(which yields a boolean value). Each will be detailed as we explore the full SQL++ grammar.
 
 ## <a id="Primary_expressions">Primary Expressions</a>
 
@@ -29,9 +29,9 @@ SQL++ is a highly composable expression language. Each SQL++ expression
returns
                   | VariableReference
                   | ParenthesizedExpression
                   | FunctionCallExpression
-                  | Constructor
+		  | Constructor
 
-The most basic building block for any SQL++ expression is PrimaryExpression. This can be
a simple literal (constant) value, a reference to a query variable that is in scope, a parenthesized
expression, a function call, or a newly constructed instance of the Asterix Data Model (such
as a newly constructed ADM record or list of ADM instances).
+The most basic building block for any SQL++ expression is PrimaryExpression. This can be
a simple literal (constant) value, a reference to a query variable that is in scope, a parenthesized
expression, a function call, or a newly constructed instance of the data model (such as a
newly constructed record or list of data model instances).
 
 ### <a id="Literals">Literals</a>
 
@@ -75,7 +75,7 @@ Different from standard SQL, double quotes play the same role as single
quotes a
     <LETTER>    ::= ["A" - "Z", "a" - "z"]
     DelimitedIdentifier   ::= "\`" (<ESCAPE_APOS> | ~["\'"])* "\`"
 
-A variable in SQL++ can be bound to any legal ADM value. A variable reference refers to the
value to which an in-scope variable is bound. (E.g., a variable binding may originate from
one of the `FROM`, `WITH` or `LET` clauses of a `SELECT` statement or from an input parameter
in the context of a function body.) Backticks, e.g., \`id\`, are used for delimited identifiers.
Delimiting is needed when a variable's desired name clashes with a SQL++ keyword or includes
characters not allowed in regular identifiers.
+A variable in SQL++ can be bound to any legal data model value. A variable reference refers
to the value to which an in-scope variable is bound. (E.g., a variable binding may originate
from one of the `FROM`, `WITH` or `LET` clauses of a `SELECT` statement or from an input parameter
in the context of a function body.) Backticks, e.g., \`id\`, are used for delimited identifiers.
Delimiting is needed when a variable's desired name clashes with a SQL++ keyword or includes
characters not allowed in regular identifiers.
 
 ##### Examples
 
@@ -100,7 +100,7 @@ The following expression evaluates to the value 2.
 
     FunctionCallExpression ::= FunctionName "(" ( Expression ( "," Expression )* )? ")"
 
-Functions are included in SQL++, like most languages, as a way to package useful functionality
or to componentize complicated or reusable SQL++ computations. A function call is a legal
SQL++ query expression that represents the ADM value resulting from the evaluation of its
body expression with the given parameter bindings; the parameter value bindings can themselves
be any SQL++ expressions.
+Functions are included in SQL++, like most languages, as a way to package useful functionality
or to componentize complicated or reusable SQL++ computations. A function call is a legal
SQL++ query expression that represents the value resulting from the evaluation of its body
expression with the given parameter bindings; the parameter value bindings can themselves
be any SQL++ expressions.
 
 The following example is a (built-in) function call expression whose value is 8.
 
@@ -116,7 +116,7 @@ The following example is a (built-in) function call expression whose value
is 8.
     RecordConstructor        ::= "{" ( FieldBinding ( "," FieldBinding )* )? "}"
     FieldBinding             ::= Expression ":" Expression
 
-A major feature of SQL++ is its ability to construct new ADM data instances. This is accomplished
using its constructors for each of the major ADM complex object structures, namely lists (ordered
or unordered) and records. Ordered lists are like JSON arrays, while unordered lists have
multiset (bag) semantics. Records are built from attributes that are field-name/field-value
pairs, again like JSON. (See the AsterixDB Data Model document for more details on each.)
+A major feature of SQL++ is its ability to construct new data model instances. This is accomplished
using its constructors for each of the model's complex object structures, namely lists (ordered
or unordered) and records. Ordered lists are like JSON arrays, while unordered lists have
multiset (bag) semantics. Records are built from attributes that are field-name/field-value
pairs, again like JSON. (See the data model document for more details on each.)
 
 The following examples illustrate how to construct a new ordered list with 3 items, a new
record with 2 fields, and a new unordered list with 4 items, respectively. List elements can
be homogeneous (as in the first example), which is the common case, or they may be heterogeneous
(as in the third example). The data values and field name values used to construct lists and
records in constructors are all simply SQL++ expressions. Thus, the list elements, field names,
and field values used in constructors can be simple literals or they can come from query variable
references or even arbitrarily complex SQL++ expressions (subqueries).
 
@@ -125,8 +125,8 @@ The following examples illustrate how to construct a new ordered list
with 3 ite
     [ 'a', 'b', 'c' ]
 
     {
-      'project name': 'AsterixDB',
-      'project members': [ 'vinayakb', 'dtabass', 'chenli', 'tsotras' ]
+      'project name': 'Hyracks',
+      'project members': [ 'vinayakb', 'dtabass', 'chenli', 'tsotras', 'tillw' ]
     }
 
     {{ 42, "forty-two!", { "rank": "Captain", "name": "America" }, 3.14159 }}
@@ -137,7 +137,7 @@ The following examples illustrate how to construct a new ordered list
with 3 ite
     Field           ::= "." Identifier
     Index           ::= "[" ( Expression | "?" ) "]"
 
-Components of complex types in ADM are accessed via path expressions. Path access can be
applied to the result of a SQL++ expression that yields an instance of  a complex type, e.g.,
a record or list instance. For records, path access is based on field names. For ordered lists,
path access is based on (zero-based) array-style indexing. SQL++ also supports an "I'm feeling
lucky" style index accessor, [?], for selecting an arbitrary element from an ordered list.
Attempts to access non-existent fields or out-of-bound list elements produce the special value
`MISSING`.
+Components of complex types in the data model are accessed via path expressions. Path access
can be applied to the result of a SQL++ expression that yields an instance of  a complex type,
e.g., a record or list instance. For records, path access is based on field names. For ordered
lists, path access is based on (zero-based) array-style indexing. SQL++ also supports an "I'm
feeling lucky" style index accessor, [?], for selecting an arbitrary element from an ordered
list. Attempts to access non-existent fields or out-of-bound list elements produce the special
value `MISSING`.
 
 The following examples illustrate field access for a record, index-based element access for
an ordered list, and also a composition thereof.
 

http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c7a8a150/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md
----------------------------------------------------------------------
diff --git a/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md b/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md
index c6dcf61..bfe4f0e 100644
--- a/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md
+++ b/asterixdb/asterix-doc/src/main/markdown/sqlpp/3_query.md
@@ -72,7 +72,7 @@ The following shows the (rich) grammar for the `SELECT` statement in SQL++.
     OrderbyClause      ::= <ORDER> <BY> Expression ( <ASC> | <DESC>
)? ( "," Expression ( <ASC> | <DESC> )? )*
     LimitClause        ::= <LIMIT> Expression ( <OFFSET> Expression )?
 
-In this section, we will make use of two stored collections of records (datasets in ADM parlance),
`GleambookUsers` and `GleambookMessages`, in a series of running examples to explain `SELECT`
queries. The contents of the example collections are as follows:
+In this section, we will make use of two stored collections of records (datasets), `GleambookUsers`
and `GleambookMessages`, in a series of running examples to explain `SELECT` queries. The
contents of the example collections are as follows:
 
 `GleambookUsers` collection:
 

http://git-wip-us.apache.org/repos/asf/asterixdb/blob/c7a8a150/asterixdb/asterix-doc/src/main/markdown/sqlpp/4_ddl.md
----------------------------------------------------------------------
diff --git a/asterixdb/asterix-doc/src/main/markdown/sqlpp/4_ddl.md b/asterixdb/asterix-doc/src/main/markdown/sqlpp/4_ddl.md
index a2eebbd..217a670 100644
--- a/asterixdb/asterix-doc/src/main/markdown/sqlpp/4_ddl.md
+++ b/asterixdb/asterix-doc/src/main/markdown/sqlpp/4_ddl.md
@@ -30,15 +30,16 @@
                       | DeleteStatement
                       | Query ";"
 
-In addition to queries, the AsterixDB implementation of SQL++ supports statements for data
definition and
-manipulation purposes as well as controlling the context to be used in evaluating SQL++ expressions.
-This section details the DDL and DML statements supported in the SQL++ language as realized
in Apache AsterixDB.
+In addition to queries, an implementation of SQL++ needs to support statements for data definition
+and manipulation purposes as well as controlling the context to be used in evaluating SQL++
expressions.
+This section details the DDL and DML statements supported in the SQL++ language as realized
today in
+Apache AsterixDB.
 
 ## <a id="Declarations">Declarations</a>
 
     DatabaseDeclaration ::= "USE" Identifier
 
-The world of data in an AsterixDB instance is organized into data namespaces called **dataverses**.
+At the uppermost level, the world of data is organized into data namespaces called **dataverses**.
 To set the default dataverse for a series of statements, the USE statement is provided in
SQL++.
 
 As an example, the following statement sets the default dataverse to be "TinySocial".
@@ -116,15 +117,15 @@ The following example creates a new dataverse named TinySocial if one
does not a
     OrderedListTypeDef   ::= "[" ( TypeExpr ) "]"
     UnorderedListTypeDef ::= "{{" ( TypeExpr ) "}}"
 
-The CREATE TYPE statement is used to create a new named ADM datatype.
-This type can then be used to create stored collections or utilized when defining one or
more other ADM datatypes.
-Much more information about the Asterix Data Model (ADM) is available in the [data model
reference guide](datamodel.html) to ADM.
+The CREATE TYPE statement is used to create a new named datatype.
+This type can then be used to create stored collections or utilized when defining one or
more other datatypes.
+Much more information about the data model is available in the [data model reference guide](datamodel.html).
 A new type can be a record type, a renaming of another type, an ordered list type, or an
unordered list type.
 A record type can be defined as being either open or closed.
 Instances of a closed record type are not permitted to contain fields other than those specified
in the create type statement.
 Instances of an open record type may carry additional fields, and open is the default for
new types if neither option is specified.
 
-The following example creates a new ADM record type called GleambookUser type.
+The following example creates a new record type called GleambookUser type.
 Since it is defined as (defaulting to) being an open type,
 instances will be permitted to contain more than what is specified in the type definition.
 The first four fields are essentially traditional typed name/value pairs (much like SQL fields).
@@ -142,7 +143,7 @@ The employment field is an ordered list of instances of another named
record typ
       employment: [ EmploymentType ]
     };
 
-The next example creates a new ADM record type, closed this time, called MyUserTupleType.
+The next example creates a new record type, closed this time, called MyUserTupleType.
 Instances of this closed type will not be permitted to have extra fields,
 although the alias field is marked as optional and may thus be NULL or MISSING in legal instances
of the type.
 Note that the type of the id field in the example is UUID.
@@ -177,7 +178,7 @@ This field type can be used if you want to have this field be an autogenerated-P
     CompactionPolicy     ::= Identifier
 
 The CREATE DATASET statement is used to create a new dataset.
-Datasets are named, unordered collections of ADM record type instances;
+Datasets are named, unordered collections of record type instances;
 they are where data lives persistently and are the usual targets for SQL++ queries.
 Datasets are typed, and the system ensures that their contents conform to their type definitions.
 An Internal dataset (the default kind) is a dataset whose content lives within and is managed
by the system.
@@ -190,8 +191,8 @@ In this case, unlike other non-optional fields, a value for the auto-generated
P
 
 Another advanced option, when creating an Internal dataset, is to specify the merge policy
to control which of the
 underlying LSM storage components to be merged.
-(AsterixDB supports Log-Structured Merge tree based physical storage for Internal datasets.)
-Apache AsterixDB currently supports four different component merging policies that can be
chosen per dataset:
+(The system supports Log-Structured Merge tree based physical storage for Internal datasets.)
+Currently the system supports four different component merging policies that can be chosen
per dataset:
 no-merge, constant, prefix, and correlated-prefix.
 The no-merge policy simply never merges disk components.
 The constant policy merges disk components when the number of components reaches a constant
number k that can be configured by the user.
@@ -200,14 +201,14 @@ It works by first trying to identify the smallest ordered (oldest to
newest) seq
 If such a sequence exists, the components in the sequence are merged together to form a single
component.
 Finally, the correlated-prefix policy is similar to the prefix policy, but it delegates the
decision of merging the disk components of all the indexes in a dataset to the primary index.
 When the correlated-prefix policy decides that the primary index needs to be merged (using
the same decision criteria as for the prefix policy), then it will issue successive merge
requests on behalf of all other indexes associated with the same dataset.
-The default policy for AsterixDB is the prefix policy except when there is a filter on a
dataset, where the preferred policy for filters is the correlated-prefix.
+The system's default policy is the prefix policy except when there is a filter on a dataset,
where the preferred policy for filters is the correlated-prefix.
 
 Another advanced option shown in the syntax above, related to performance and mentioned above,
is that a **filter** can optionally be created on a field to further optimize range queries
with predicates on the filter's field.
 Filters allow some range queries to avoid searching all LSM components when the query conditions
match the filter.
 (Refer to [Filter-Based LSM Index Acceleration](filters.html) for more information about
filters.)
 
 An External dataset, in contrast to an Internal dataset, has data stored outside of the system's
control.
-Files living in HDFS or in the local filesystem(s) of a cluster's nodes are currently supported
in AsterixDB.
+Files living in HDFS or in the local filesystem(s) of a cluster's nodes are currently supported.
 External dataset support allows SQL++ queries to treat foreign data as though it were stored
in the system,
 making it possible to query "legacy" file data (e.g., Hive data) without having to physically
import it.
 When defining an External dataset, an appropriate adapter type must be selected for the desired
external data.
@@ -369,7 +370,7 @@ The LOAD statement accepts the same adapters and the same parameters as
discusse
 (See the [guide to external data](externaldata.html) for more information on the available
adapters.)
 If a dataset has an auto-generated primary key field, the file to be imported should not
include that field in it.
 
-The following example shows how to bulk load the GleambookUsers dataset from an external
file containing data that has been prepared in ADM format.
+The following example shows how to bulk load the GleambookUsers dataset from an external
file containing data that has been prepared in ADM (Asterix Data Model) format.
 
 ##### Example
 
@@ -390,7 +391,7 @@ value for that field in it.
 (The system will automatically extend the provided record with this additional field and
a corresponding value.)
 Insertion will fail if the dataset already has data with the primary key value(s) being inserted.
 
-In AsterixDB, inserts are processed transactionally.
+Inserts are processed transactionally by the system.
 The transactional scope of each insert transaction is the insertion of a single object plus
its affiliated secondary index entries (if any).
 If the query part of an insert returns a single object, then the INSERT statement will be
a single, atomic transaction.
 If the query part returns multiple objects, each object being inserted will be treated as
a separate tranaction.
@@ -414,8 +415,7 @@ The following example illustrates a query-based upsert operation.
 
     UPSERT INTO UsersCopy (SELECT VALUE user FROM GleambookUsers user)
 
-*Editor's note: Upserts currently work in AQL but are apparently disabled at the moment in
SQL++.
-(@Yingyi, is that indeed the case?)*
+*Editor's note: Upserts currently work in AQL but are not yet enabled (at the moment) in
SQL++.
 
 ### <a id="Deletes">DELETEs</a>
 
@@ -424,7 +424,7 @@ The following example illustrates a query-based upsert operation.
 The SQL++ DELETE statement is used to delete data from a target dataset.
 The data to be deleted is identified by a boolean expression involving the variable bound
to the target dataset in the DELETE statement.
 
-Deletes in AsterixDB are processed transactionally.
+Deletes are processed transactionally by the system.
 The transactional scope of each delete transaction is the deletion of a single object plus
its affiliated secondary index entries (if any).
 If the boolean expression for a delete identifies a single object, then the DELETE statement
itself will be a single, atomic transaction.
 If the expression identifies multiple objects, then each object deleted will be handled as
a separate transaction.


Mime
View raw message