Return-Path: X-Original-To: apmail-calcite-commits-archive@www.apache.org Delivered-To: apmail-calcite-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BCF7418D9C for ; Mon, 1 Jun 2015 17:56:39 +0000 (UTC) Received: (qmail 15664 invoked by uid 500); 1 Jun 2015 17:56:39 -0000 Delivered-To: apmail-calcite-commits-archive@calcite.apache.org Received: (qmail 15636 invoked by uid 500); 1 Jun 2015 17:56:39 -0000 Mailing-List: contact commits-help@calcite.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@calcite.incubator.apache.org Delivered-To: mailing list commits@calcite.incubator.apache.org Received: (qmail 15627 invoked by uid 99); 1 Jun 2015 17:56:39 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Jun 2015 17:56:39 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 195371A41B1 for ; Mon, 1 Jun 2015 17:56:39 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.792 X-Spam-Level: X-Spam-Status: No, score=0.792 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001, WEIRD_QUOTING=0.001] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id xiVK5psxPRC1 for ; Mon, 1 Jun 2015 17:56:23 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with SMTP id 456ED47C13 for ; Mon, 1 Jun 2015 17:56:17 +0000 (UTC) Received: (qmail 13897 invoked by uid 99); 1 Jun 2015 17:56:16 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Jun 2015 17:56:16 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 6E9C9E051D; Mon, 1 Jun 2015 17:56:16 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: jhyde@apache.org To: commits@calcite.incubator.apache.org Date: Mon, 01 Jun 2015 17:56:29 -0000 Message-Id: <59e5535dc3a94218b9b9753b2e5260e9@git.apache.org> In-Reply-To: <4ea2844b55a44df9a00eed44c88a7074@git.apache.org> References: <4ea2844b55a44df9a00eed44c88a7074@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [15/19] incubator-calcite git commit: [CALCITE-355] Web site http://git-wip-us.apache.org/repos/asf/incubator-calcite/blob/5c049bc8/doc/howto.md ---------------------------------------------------------------------- diff --git a/doc/howto.md b/doc/howto.md deleted file mode 100644 index 96c9d71..0000000 --- a/doc/howto.md +++ /dev/null @@ -1,803 +0,0 @@ - -# Calcite HOWTO - -Here's some miscellaneous documentation about using Calcite and its various -adapters. - -## Building from a source distribution - -Prerequisites are maven (3.2.1 or later) -and Java (JDK 1.7 or later, 1.8 preferred) on your path. - -Unpack the source distribution `.tar.gz` or `.zip` file, -`cd` to the root directory of the unpacked source, -then build using maven: - -```bash -$ tar xvfz calcite-1.3.0-incubating-source.tar.gz -$ cd calcite-1.3.0-incubating -$ mvn install -``` - -[Running tests](howto.md#running-tests) describes how to run more or fewer -tests. - -## Building from git - -Prerequisites are git, maven (3.2.1 or later) -and Java (JDK 1.7 or later, 1.8 preferred) on your path. - -Create a local copy of the github repository, -`cd` to its root directory, -then build using maven: - -```bash -$ git clone git://github.com/apache/incubator-calcite.git -$ cd incubator-calcite -$ mvn install -``` - -[Running tests](howto.md#running-tests) describes how to run more or fewer -tests. - -## Running tests - -The test suite will run by default when you build, unless you specify -`-DskipTests`: - -```bash -$ mvn clean # Note: mvn clean install does not work, use mvn clean && mvn install -$ mvn -DskipTests install -``` - -There are other options that control which tests are run, and in what -environment, as follows. - -* `-Dcalcite.test.db=DB` (where db is `h2`, `hsqldb`, `mysql`, or `postgresql`) allows you - to change the JDBC data source for the test suite. Calcite's test - suite requires a JDBC data source populated with the foodmart data - set. - * `hsqldb`, the default, uses an in-memory hsqldb database. - * all others access test virtual machine (see [integration tests](howto.md#running-integration-tests) below) - `mysql` and `postgresql` might be somewhat faster than hsqldb, but you need to populate it (i.e. provision a VM). -* `-Dcalcite.debug` prints extra debugging information to stdout. -* `-Dcalcite.test.slow` enables tests that take longer to execute. For - example, there are tests that create virtual TPC-H and TPC-DS schemas - in-memory and run tests from those benchmarks. -* `-Dcalcite.test.splunk=true` enables tests that run against Splunk. - Splunk must be installed and running. - -## Running integration tests - -For testing Calcite's external adapters, a test virtual machine should be used. -The VM includes H2, HSQLDB, MySQL, MongoDB, and PostgreSQL. - -Test VM requires 5GiB of disk space and it takes 30 minutes to build. - -Note: you can use [calcite-test-dataset](https://github.com/vlsi/calcite-test-dataset) - to populate your own database, however it is recommended to use test VM so the test environment can be reproduced. - -### VM preparation - -0) Install dependencies: [Vagrant](https://www.vagrantup.com/) and [VirtualBox](https://www.virtualbox.org/) - -1) Clone https://github.com/vlsi/calcite-test-dataset.git at the same level as calcite repository. -For instance: -```bash -code - +-- calcite - +-- calcite-test-dataset -``` - -Note: integration tests search for ../calcite-test-dataset or ../../calcite-test-dataset. - You can specify full path via calcite.test.dataset system property. - -2) Build and start the VM: -```bash -cd calcite-test-dataset && mvn install -``` - -### VM management - -Test VM is provisioned by Vagrant, so regular Vagrant `vagrant up` and `vagrant halt` should be used to start and stop the VM. -The connection strings for different databases are listed in [calcite-test-dataset](https://github.com/vlsi/calcite-test-dataset) readme. - -### Suggested test flow - -Note: test VM should be started before you launch integration tests. Calcite itself does not start/stop the VM. - -Command line: -* Executing regular unit tests (does not require external data): no change. `mvn test` or `mvn install`. -* Executing all tests, for all the DBs: `mvn verify -Pit`. `it` stands for "integration-test". `mvn install -Pit` works as well. -* Executing just tests for external DBs, excluding unit tests: `mvn -Dtest=foo -DfailIfNoTests=false -Pit verify` -* Executing just MongoDB tests: `cd mongo; mvn verify -Pit` - -From within IDE: -* Executing regular unit tests: no change. -* Executing MongoDB tests: run `MongoAdapterIT.java` as usual (no additional properties are required) -* Executing MySQL tests: run `JdbcTest` and `JdbcAdapterTest` with setting `-Dcalcite.test.db=mysql` -* Executing PostgreSQL tests: run `JdbcTest` and `JdbcAdapterTest` with setting `-Dcalcite.test.db=postgresql` - -### Integration tests technical details - -Tests with external data are executed at maven's integration-test phase. -We do not currently use pre-integration-test/post-integration-test, however we could use that in future. -The verification of build pass/failure is performed at verify phase. -Integration tests should be named `...IT.java`, so they are not picked up on unit test execution. - -## Contributing - -We welcome contributions. - -If you are planning to make a large contribution, talk to us first! It -helps to agree on the general approach. Log a -[JIRA case](https://issues.apache.org/jira/browse/CALCITE) for your -proposed feature or start a discussion on the dev list. - -Fork the github repository, and create a branch for your feature. - -Develop your feature and test cases, and make sure that `mvn -install` succeeds. (Run extra tests if your change warrants it.) - -Commit your change to your branch, and use a comment that starts with -the JIRA case number, like this: - -``` -[CALCITE-345] AssertionError in RexToLixTranslator comparing to date literal -``` - -If your change had multiple commits, use `git rebase -i master` to -combine them into a single commit, and to bring your code up to date -with the latest on the main line. - -Then push your commit(s) to github, and create a pull request from -your branch to the incubator-calcite master branch. Update the JIRA case -to reference your pull request, and a committer will review your -changes. - -## Getting started - -Calcite is a community, so the first step to joining the project is to introduce yourself. -Join the [developers list](http://mail-archives.apache.org/mod_mbox/incubator-calcite-dev/) -and send an email. - -If you have the chance to attend a [meetup](http://www.meetup.com/Apache-Calcite/), -or meet [members of the community](http://calcite.incubator.apache.org/team-list.html) -at a conference, that's also great. - -Choose an initial task to work on. It should be something really simple, -such as a bug fix or a [Jira task that we have labeled -"newbie"](https://issues.apache.org/jira/issues/?jql=labels%20%3D%20newbie%20%26%20project%20%3D%20Calcite%20%26%20status%20%3D%20Open). -Follow the [contributing guidelines](#contributing) to get your change committed. - -After you have made several useful contributions we may -[invite you to become a committer](https://community.apache.org/contributors/). -We value all contributions that help to build a vibrant community, not just code. -You can contribute by testing the code, helping verify a release, -writing documentation or the web site, -or just by answering questions on the list. - -## Tracing - -To enable tracing, add the following flags to the java command line: - -``` --Dcalcite.debug=true -Djava.util.logging.config.file=core/src/test/resources/logging.properties -``` - -The first flag causes Calcite to print the Java code it generates -(to execute queries) to stdout. It is especially useful if you are debugging -mysterious problems like this: - -``` -Exception in thread "main" java.lang.ClassCastException: Integer cannot be cast to Long - at Baz$1$1.current(Unknown Source) -``` - -The second flag specifies a config file for -the java.util.logging -framework. Put the following into core/src/test/resources/logging.properties: - -```properties -handlers= java.util.logging.ConsoleHandler -.level= INFO -org.apache.calcite.plan.RelOptPlanner.level=FINER -java.util.logging.ConsoleHandler.level=ALL -``` - -The line `org.apache.calcite.plan.RelOptPlanner.level=FINER` tells the planner to produce -fairly verbose output. You can modify the file to enable other loggers, or to change levels. -For instance, if you change `FINER` to `FINEST` the planner will give you an account of the -planning process so detailed that it might fill up your hard drive. - -## CSV adapter - -See the tutorial. - -## MongoDB adapter - -First, download and install Calcite, -and install MongoDB. - -Note: you can use MongoDB from integration test virtual machine above. - -Import MongoDB's zipcode data set into MongoDB: - -```bash -$ curl -o /tmp/zips.json http://media.mongodb.org/zips.json -$ mongoimport --db test --collection zips --file /tmp/zips.json -Tue Jun 4 16:24:14.190 check 9 29470 -Tue Jun 4 16:24:14.469 imported 29470 objects -``` - -Log into MongoDB to check it's there: - -```bash -$ mongo -MongoDB shell version: 2.4.3 -connecting to: test -> db.zips.find().limit(3) -{ "city" : "ACMAR", "loc" : [ -86.51557, 33.584132 ], "pop" : 6055, "state" : "AL", "_id" : "35004" } -{ "city" : "ADAMSVILLE", "loc" : [ -86.959727, 33.588437 ], "pop" : 10616, "state" : "AL", "_id" : "35005" } -{ "city" : "ADGER", "loc" : [ -87.167455, 33.434277 ], "pop" : 3205, "state" : "AL", "_id" : "35006" } -> exit -bye -``` - -Connect using the -mongo-zips-model.json -Calcite model: -```bash -$ ./sqlline -sqlline> !connect jdbc:calcite:model=mongodb/target/test-classes/mongo-zips-model.json admin admin -Connecting to jdbc:calcite:model=mongodb/target/test-classes/mongo-zips-model.json -Connected to: Calcite (version 1.x.x) -Driver: Calcite JDBC Driver (version 1.x.x) -Autocommit status: true -Transaction isolation: TRANSACTION_REPEATABLE_READ -sqlline> !tables -+------------+--------------+-----------------+---------------+ -| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | -+------------+--------------+-----------------+---------------+ -| null | mongo_raw | zips | TABLE | -| null | mongo_raw | system.indexes | TABLE | -| null | mongo | ZIPS | VIEW | -| null | metadata | COLUMNS | SYSTEM_TABLE | -| null | metadata | TABLES | SYSTEM_TABLE | -+------------+--------------+-----------------+---------------+ -sqlline> select count(*) from zips; -+---------+ -| EXPR$0 | -+---------+ -| 29467 | -+---------+ -1 row selected (0.746 seconds) -sqlline> !quit -Closing: org.apache.calcite.jdbc.FactoryJdbc41$CalciteConnectionJdbc41 -$ -``` - -## Splunk adapter - -To run the test suite and sample queries against Splunk, -load Splunk's `tutorialdata.zip` data set as described in -the Splunk tutorial. - -(This step is optional, but it provides some interesting data for the sample -queries. It is also necessary if you intend to run the test suite, using -`-Dcalcite.test.splunk=true`.) - -## Implementing an adapter - -New adapters can be created by implementing `CalcitePrepare.Context`: - -```java -import org.apache.calcite.adapter.java.JavaTypeFactory; -import org.apache.calcite.jdbc.CalcitePrepare; -import org.apache.calcite.jdbc.CalciteRootSchema; - -public class AdapterContext implements CalcitePrepare.Context { - @Override - public JavaTypeFactory getTypeFactory() { - // adapter implementation - return typeFactory; - } - - @Override - public CalciteRootSchema getRootSchema() { - // adapter implementation - return rootSchema; - } -} -``` - -### Testing adapter in Java - -The example below shows how SQL query can be submitted to -`CalcitePrepare` with a custom context (`AdapterContext` in this -case). Calcite prepares and implements the query execution, using the -resources provided by the `Context`. `CalcitePrepare.PrepareResult` -provides access to the underlying enumerable and methods for -enumeration. The enumerable itself can naturally be some adapter -specific implementation. - -```java -import org.apache.calcite.jdbc.CalcitePrepare; -import org.apache.calcite.prepare.CalcitePrepareImpl; -import org.junit.Test; - -public class AdapterContextTest { - @Test - public void testSelectAllFromTable() { - AdapterContext ctx = new AdapterContext(); - String sql = "SELECT * FROM TABLENAME"; - Class elementType = Object[].class; - CalcitePrepare.PrepareResult prepared = - new CalcitePrepareImpl().prepareSql(ctx, sql, null, elementType, -1); - Object enumerable = prepared.getExecutable(); - // etc. - } -} -``` - -### JavaTypeFactory - -When Calcite compares types (instances of `RelDataType`), it requires them to be the same -object. If there are two distinct type instances that refer to the -same Java type, Calcite may fail to recognize that they match. It is -recommended to: -* Use a single instance of `JavaTypeFactory` within the calcite context; -* Store the types so that the same object is always returned for the same type. - -## Set up PGP signing keys (for Calcite committers) - -Follow instructions at http://www.apache.org/dev/release-signing to -create a key pair. (On Mac OS X, I did `brew install gpg` and `gpg ---gen-key`.) - -Add your public key to the `KEYS` file by following instructions in -the `KEYS` file. - -## Making a snapshot (for Calcite committers) - -Before you start: -* Set up signing keys as described above. -* Make sure you are using JDK 1.7 (not 1.8). -* Make sure build and tests succeed with `-Dcalcite.test.db=hsqldb` (the default) - -```bash -# Set passphrase variable without putting it into shell history -read -s GPG_PASSPHRASE - -# Make sure that there are no junk files in the sandbox -git clean -xn -mvn clean - -mvn -Papache-release -Dgpg.passphrase=${GPG_PASSPHRASE} install -``` - -When the dry-run has succeeded, change `install` to `deploy`. - -## Making a release (for Calcite committers) - -Before you start: -* Set up signing keys as described above. -* Make sure you are using JDK 1.7 (not 1.8). -* Check that `README`, `README.md` and `doc/howto.md` have the correct version number. -* Set `version.major` and `version.minor` in `pom.xml`. -* Make sure build and tests succeed, including with - -Dcalcite.test.db={mysql,hsqldb}, -Dcalcite.test.slow=true, - -Dcalcite.test.mongodb=true, -Dcalcite.test.splunk=true. -* Trigger a - Coverity scan - by merging the latest code into the `julianhyde/coverity_scan` branch, - and when it completes, make sure that there are no important issues. -* Make sure that - - every "resolved" JIRA case (including duplicates) has - a fix version assigned (most likely the version we are - just about to release) - -Create a release branch named after the release, e.g. `branch-1.1`, and push it to Apache. - -```bash -$ git checkout -b branch-X.Y -$ git push -u origin branch-X.Y -``` - -We will use the branch for the entire the release process. Meanwhile, -we do not allow commits to the master branch. After the release is -final, we can use `git merge --ff-only` to append the changes on the -release branch onto the master branch. (Apache does not allow reverts -to the master branch, which makes it difficult to clean up the kind of -messy commits that inevitably happen while you are trying to finalize -a release.) - -Now, set up your environment and do a dry run. The dry run will not -commit any changes back to git and gives you the opportunity to verify -that the release process will complete as expected. - -If any of the steps fail, clean up (see below), fix the problem, and -start again from the top. - -```bash -# Set passphrase variable without putting it into shell history -read -s GPG_PASSPHRASE - -# Make sure that there are no junk files in the sandbox -git clean -xn -mvn clean - -# Do a dry run of the release:prepare step, which sets version numbers. -mvn -DdryRun=true -DskipTests -DreleaseVersion=X.Y.Z-incubating -DdevelopmentVersion=X.Y.Z+1-incubating-SNAPSHOT -Papache-release -Darguments="-Dgpg.passphrase=${GPG_PASSPHRASE}" release:prepare 2>&1 | tee /tmp/prepare-dry.log -``` - -Check the artifacts: -* In the `target` directory should be these 8 files, among others: - * apache-calcite-X.Y.Z-incubating-src.tar.gz - * apache-calcite-X.Y.Z-incubating-src.tar.gz.asc - * apache-calcite-X.Y.Z-incubating-src.tar.gz.md5 - * apache-calcite-X.Y.Z-incubating-src.tar.gz.sha1 - * apache-calcite-X.Y.Z-incubating-src.zip - * apache-calcite-X.Y.Z-incubating-src.zip.asc - * apache-calcite-X.Y.Z-incubating-src.zip.md5 - * apache-calcite-X.Y.Z-incubating-src.zip.sha1 -* Note that the file names start `apache-calcite-` and include - `incubating` in the version. -* In the two source distros `.tar.gz` and `.zip` (currently there is - no binary distro), check that all files belong to a directory called - `apache-calcite-X.Y.Z-incubating-src`. -* That directory must contain files `DISCLAIMER`, `NOTICE`, `LICENSE`, - `README`, `README.md` - * Check that the version in `README` is correct -* In each .jar (for example - `core/target/calcite-core-X.Y.Z-incubating.jar` and - `mongodb/target/calcite-mongodb-X.Y.Z-incubating-sources.jar`), check - that the `META-INF` directory contains `DEPENDENCIES`, `LICENSE`, - `NOTICE` and `git.properties` -* In each .jar, check that `org-apache-calcite-jdbc.properties` is - present and does not contain un-substituted `${...}` variables -* Check PGP, per https://httpd.apache.org/dev/verification.html - -Now, remove the `-DdryRun` flag and run the release for real. - -```bash -# Prepare sets the version numbers, creates a tag, and pushes it to git. -mvn -DdryRun=false -DskipTests -DreleaseVersion=X.Y.Z-incubating -DdevelopmentVersion=X.Y.Z+1-incubating-SNAPSHOT -Papache-release -Darguments="-Dgpg.passphrase=${GPG_PASSPHRASE}" release:prepare 2>&1 | tee /tmp/prepare.log - -# Perform checks out the tagged version, builds, and deploys to the staging repository -mvn -DskipTests -Papache-release -Darguments="-Dgpg.passphrase=${GPG_PASSPHRASE}" release:perform 2>&1 | tee /tmp/perform.log -``` - -Verify the staged artifacts in the Nexus repository: -* Go to https://repository.apache.org/ -* Under `Build Promotion`, click `Staging Repositories` -* In the `Staging Repositories` tab there should be a line with profile `org.apache.calcite` -* Navigate through the artifact tree and make sure the .jar, .pom, .asc files are present -* Check the box on in the first column of the row, - and press the 'Close' button to publish the repository at - https://repository.apache.org/content/repositories/orgapachecalcite-1000 - (or a similar URL) - -Upload the artifacts via subversion to a staging area, -https://dist.apache.org/repos/dist/dev/incubator/calcite/apache-calcite-X.Y.Z-incubating-rcN: - -```bash -# Create a subversion workspace, if you haven't already -mkdir -p ~/dist/dev -pushd ~/dist/dev -svn co https://dist.apache.org/repos/dist/dev/incubator/calcite -popd - -# Move the files into a directory -cd target -mkdir ~/dist/dev/calcite/apache-calcite-X.Y.Z-incubating-rcN -mv apache-calcite-* ~/dist/dev/calcite/apache-calcite-X.Y.Z-incubating-rcN - -# Check in -cd ~/dist/dev/calcite -svn add apache-calcite-X.Y.Z-incubating-rcN -svn ci -``` - -## Cleaning up after a failed release attempt (for Calcite committers) - -``` -# Make sure that the tag you are about to generate does not already -# exist (due to a failed release attempt) -git tag - -# If the tag exists, delete it locally and remotely -git tag -d apache-calcite-X.Y.Z-incubating -git push origin :refs/tags/apache-calcite-X.Y.Z-incubating - -# Remove modified files -mvn release:clean - -# Check whether there are modified files and if so, go back to the -# original git commit -git status -git reset --hard HEAD -``` - -## Validate a release - -```bash -# Check that the signing key (e.g. 2AD3FAE3) is pushed -gpg --recv-keys key - -# Check keys -curl -O https://dist.apache.org/repos/dist/release/incubator/calcite/KEYS - -# Sign/check md5 and sha1 hashes -# (Assumes your O/S has 'md5' and 'sha1' commands.) -function checkHash() { - cd "$1" - for i in *.{zip,pom,gz}; do - if [ ! -f $i ]; then - continue - fi - if [ -f $i.md5 ]; then - if [ "$(cat $i.md5)" = "$(md5 -q $i)" ]; then - echo $i.md5 present and correct - else - echo $i.md5 does not match - fi - else - md5 -q $i > $i.md5 - echo $i.md5 created - fi - if [ -f $i.sha1 ]; then - if [ "$(cat $i.sha1)" = "$(sha1 -q $i)" ]; then - echo $i.sha1 present and correct - else - echo $i.sha1 does not match - fi - else - sha1 -q $i > $i.sha1 - echo $i.sha1 created - fi - done -} -checkHash apache-calcite-X.Y.Z-incubating-rcN -``` - -## Get approval for a release via Apache voting process (for Calcite committers) - -Release vote on dev list - -``` -To: dev@calcite.incubator.apache.org -Subject: [VOTE] Release apache-calcite-X.Y.Z-incubating (release candidate N) - -Hi all, - -I have created a build for Apache Calcite X.Y.Z-incubating, release candidate N. - -Thanks to everyone who has contributed to this release. - You can read the release notes here: -https://github.com/apache/incubator-calcite/blob/XXXX/doc/history.md - -The commit to be voted upon: -http://git-wip-us.apache.org/repos/asf/incubator-calcite/commit/NNNNNN - -Its hash is XXXX. - -The artifacts to be voted on are located here: -https://dist.apache.org/repos/dist/dev/incubator/calcite/apache-calcite-X.Y.Z-incubating-rcN/ - -The hashes of the artifacts are as follows: -src.tar.gz.md5 XXXX -src.tar.gz.sha1 XXXX -src.zip.md5 XXXX -src.zip.sha1 XXXX - -A staged Maven repository is available for review at: -https://repository.apache.org/content/repositories/orgapachecalcite-NNNN - -Release artifacts are signed with the following key: -https://people.apache.org/keys/committer/jhyde.asc - -Please vote on releasing this package as Apache Calcite X.Y.Z-incubating. - -The vote is open for the next 72 hours and passes if a majority of -at least three +1 PPMC votes are cast. - -[ ] +1 Release this package as Apache Calcite X.Y.Z-incubating -[ ] 0 I don't feel strongly about it, but I'm okay with the release -[ ] -1 Do not release this package because... - - -Here is my vote: - -+1 (binding) - -Julian -``` - -After vote finishes, send out the result: - -``` -Subject: [RESULT] [VOTE] Release apache-calcite-X.Y.Z-incubating (release candidate N) -To: dev@calcite.incubator.apache.org - -Thanks to everyone who has tested the release candidate and given -their comments and votes. - -The tally is as follows. - -N binding +1s: - - -N non-binding +1s: - - -No 0s or -1s. - -Therefore I am delighted to announce that the proposal to release -Apache Calcite X.Y.Z-incubating has passed. - -I'll now start a vote on the general list. Those of you in the IPMC, -please recast your vote on the new thread. - -Julian -``` - -Use the [Apache URL shortener](http://s.apache.org) to generate -shortened URLs for the vote proposal and result emails. Examples: -[s.apache.org/calcite-1.2-vote](http://s.apache.org/calcite-1.2-vote) and -[s.apache.org/calcite-1.2-result](http://s.apache.org/calcite-1.2-result). - -Propose a vote on the incubator list. - -``` -To: general@incubator.apache.org -Subject: [VOTE] Release Apache Calcite X.Y.Z (incubating) - -Hi all, - -The Calcite community has voted on and approved a proposal to release -Apache Calcite X.Y.Z (incubating). - -Proposal: -http://s.apache.org/calcite-X.Y.Z-vote - -Vote result: -N binding +1 votes -N non-binding +1 votes -No -1 votes -http://s.apache.org/calcite-X.Y.Z-result - -The commit to be voted upon: -http://git-wip-us.apache.org/repos/asf/incubator-calcite/commit/NNNNNN - -Its hash is XXXX. - -The artifacts to be voted on are located here: -https://dist.apache.org/repos/dist/dev/incubator/calcite/apache-calcite-X.Y.Z-incubating-rcN/ - -The hashes of the artifacts are as follows: -src.tar.gz.md5 XXXX -src.tar.gz.sha1 XXXX -src.zip.md5 XXXX -src.zip.sha1 XXXX - -A staged Maven repository is available for review at: -https://repository.apache.org/content/repositories/orgapachecalcite-NNNN - -Release artifacts are signed with the following key: -https://people.apache.org/keys/committer/jhyde.asc - -Pursuant to the Releases section of the Incubation Policy and with -the endorsement of NNN of our mentors we would now like to request -the permission of the Incubator PMC to publish the release. The vote -is open for 72 hours, or until the necessary number of votes (3 +1) -is reached. - -[ ] +1 Release this package as Apache Calcite X.Y.Z incubating -[ ] -1 Do not release this package because... - -Julian Hyde, on behalf of Apache Calcite PPMC -``` - -After vote finishes, send out the result: - -``` -To: general@incubator.apache.org -Subject: [RESULT] [VOTE] Release Apache Calcite X.Y.Z (incubating) - -This vote passes with N +1s and no 0 or -1 votes: -+1 (mentor) - -There was some feedback during voting. I shall open a separate -thread to discuss. - -Thanks everyone. We’ll now roll the release out to the mirrors. - -Julian -``` - -## Publishing a release (for Calcite committers) - -After a successful release vote, we need to push the release -out to mirrors, and other tasks. - -In JIRA, search for all issues resolved in this release, -and do a bulk update changing their status to "Closed", -with a change comment -"Resolved in release X.Y.Z-incubating (YYYY-MM-DD)" -(fill in release number and date appropriately). - -Promote the staged nexus artifacts. -* Go to https://repository.apache.org/ -* Under "Build Promotion" click "Staging Repositories" -* In the line with "orgapachecalcite-xxxx", check the box -* Press "Release" button - -Check the artifacts into svn. - -```bash -# Get the release candidate. -mkdir -p ~/dist/dev -cd ~/dist/dev -svn co https://dist.apache.org/repos/dist/dev/incubator/calcite - -# Copy the artifacts. Note that the copy does not have '-rcN' suffix. -mkdir -p ~/dist/release -cd ~/dist/release -svn co https://dist.apache.org/repos/dist/release/incubator/calcite -cd calcite -cp -rp ../../dev/calcite/apache-calcite-X.Y.Z-incubating-rcN apache-calcite-X.Y.Z-incubating -svn add apache-calcite-X.Y.Z-incubating - -# Check in. -svn ci -``` - -Svnpubsub will publish to -https://dist.apache.org/repos/dist/release/incubator/calcite and propagate to -http://www.apache.org/dyn/closer.cgi/incubator/calcite within 24 hours. - -## Publishing the web site (for Calcite committers) - -Get the code: - -```bash -$ svn co https://svn.apache.org/repos/asf/incubator/calcite/site calcite-site -``` - -(Note: `https:`, not `http:`.) - -Build the site: - -```bash -$ cd calcite-site -$ ./build.sh -``` - -It will prompt you to install jekyll, redcarpet and pygments, if you -do not have them installed. It will also check out the git source code -repo, so that it can generate javadoc. - -Check in: - -```bash -svn ci -m"Commit message" file... -``` - -The site will automatically be deployed as http://calcite.incubator.apache.org. http://git-wip-us.apache.org/repos/asf/incubator-calcite/blob/5c049bc8/doc/lattice.md ---------------------------------------------------------------------- diff --git a/doc/lattice.md b/doc/lattice.md deleted file mode 100644 index d7d4642..0000000 --- a/doc/lattice.md +++ /dev/null @@ -1,125 +0,0 @@ - -# Lattice - -A lattice is a framework for creating and populating materialized views, -and for recognizing that a materialized view can be used to solve a -particular query. - -A lattice represents a star (or snowflake) schema, not a general -schema. In particular, all relationships must be many-to-one, heading -from a fact table at the center of the star. - -The name derives from the mathematics: a -lattice -is a -partially -ordered set where any two elements have a unique greatest lower -bound and least upper bound. - -[HRU96] observed that the set of possible -materializations of a data cube forms a lattice, and presented an -algorithm to choose a good set of materializations. Calcite's -recommendation algorithm is derived from this. - -The lattice definition uses a SQL statement to represent the star. SQL -is a useful short-hand to represent several tables joined together, -and assigning aliases to the column names (it more convenient than -inventing a new language to represent relationships, join conditions -and cardinalities). - -Unlike regular SQL, order is important. If you put A before B in the -FROM clause, and make a join between A and B, you are saying that -there is a many-to-one foreign key relationship from A to B. (E.g. in -the example lattice, the Sales fact table occurs before the Time -dimension table, and before the Product dimension table. The Product -dimension table occurs before the ProductClass outer dimension table, -further down an arm of a snowflake.) - -A lattice implies constraints. In the A to B relationship, there is a -foreign key on A (i.e. every value of A's foreign key has a -corresponding value in B's key), and a unique key on B (i.e. no key -value occurs more than once). These constraints are really important, -because it allows the planner to remove joins to tables whose columns -are not being used, and know that the query results will not change. - -Calcite does not check these constraints. If they are violated, -Calcite will return wrong results. - -A lattice is a big, virtual join view. It is not materialized (it -would be several times larger than the star schema, because of -denormalization) and you probably wouldn't want to query it (far too -many columns). So what is it useful for? As we said above, (a) the -lattice declares some very useful primary and foreign key constraints, -(b) it helps the query planner map user queries onto -filter-join-aggregate materialized views (the most useful kind of -materialized view for DW queries), (c) gives Calcite a framework -within which to gather stats about data volumes and user queries, (d) -allows Calcite to automatically design and populate materialized -views. - -Most star schema models force you to choose whether a column is a -dimension or a measure. In a lattice, every column is a dimension -column. (That is, it can become one of the columns in the GROUP BY clause -to query the star schema at a particular dimensionality). Any column -can also be used in a measure; you define measures by giving the -column and an aggregate function. - -If "unit_sales" tends to be used much more often as a measure rather -than a dimension, that's fine. Calcite's algorithm should notice that -it is rarely aggregated, and not be inclined to create tiles that -aggregate on it. (By "should" I mean "could and one day will". The -algorithm does not currently take query history into account when -designing tiles.) - -But someone might want to know whether orders with fewer than 5 items -were more or less profitable than orders with more than 100. All of a -sudden, "unit_sales" has become a dimension. If there's virtually zero -cost to declaring a column a dimension column, I figured let's make -them all dimension columns. - -The model allows for a particular table to be used more than once, -with a different table alias. You could use this to model say -OrderDate and ShipDate, with two uses to the Time dimension table. - -Most SQL systems require that the column names in a view are unique. -This is hard to achieve in a lattice, because you often include -primary and foreign key columns in a join. So Calcite lets you refer -to columns in two ways. If the column is unique, you can use its name, -["unit_sales"]. Whether or not it is unique in the lattice, it will be -unique in its table, so you can use it qualified by its table alias. -Examples: -* ["sales", "unit_sales"] -* ["ship_date", "time_id"] -* ["order_date", "time_id"] - -A "tile" is a materialized table in a lattice, with a particular -dimensionality. (What Kylin calls a "cuboid".) The "tiles" attribute -of the lattice JSON element -defines an initial set of tiles to materialize. - -If you run the algorithm, you can omit the tiles attribute. Calcite -will choose an initial set. If you include the tiles attribute, the -algorithm will start with that list and then start finding other tiles -that are complementary (i.e. "fill in the gaps" left by the initial -tiles). - -### References - -* [HRU96] V. Harinarayan, A. Rajaraman and J. Ullman. - Implementing - data cubes efficiently. In _Proc. ACM SIGMOD Conf._, Montreal, 1996. http://git-wip-us.apache.org/repos/asf/incubator-calcite/blob/5c049bc8/doc/model.md ---------------------------------------------------------------------- diff --git a/doc/model.md b/doc/model.md deleted file mode 100644 index 3f49159..0000000 --- a/doc/model.md +++ /dev/null @@ -1,435 +0,0 @@ - -# Calcite JSON model reference - -## Elements - -### Root - -```json -{ - version: '1.0', - defaultSchema: 'mongo', - schemas: [ Schema... ] -} -``` - -`version` (required string) must have value `1.0`. - -`defaultSchema` (optional string). If specified, it is -the name (case-sensitive) of a schema defined in this model, and will -become the default schema for connections to Calcite that use this model. - -`schemas` (optional list of Schema elements). - -### Schema - -Occurs within `root.schemas`. - -```json -{ - name: 'foodmart', - path: ['lib'], - cache: true, - materializations: [ Materialization... ] -} -``` - -`name` (required string) is the name of the schema. - -`type` (optional string, default `map`) indicates sub-type. Values are: -* `map` for Map Schema -* `custom` for Custom Schema -* `jdbc` for JDBC Schema - -`path` (optional list) is the SQL path that is used to -resolve functions used in this schema. If specified it must be a list, -and each element of the list must be either a string or a list of -strings. For example, - -```json - path: [ ['usr', 'lib'], 'lib' ] -``` - -declares a path with two elements: the schema '/usr/lib' and the -schema '/lib'. Most schemas are at the top level, so you can use a -string. - -`materializations` (optional list of -Materialization) defines the tables -in this schema that are materializations of queries. - -`cache` (optional boolean, default true) tells Calcite whether to -cache metadata (tables, functions and sub-schemas) generated -by this schema. - -* If `false`, Calcite will go back to the schema each time it needs - metadata, for example, each time it needs a list of tables in order to - validate a query against the schema. - -* If `true`, Calcite will cache the metadata the first time it reads - it. This can lead to better performance, especially if name-matching is - case-insensitive. - -However, it also leads to the problem of cache staleness. -A particular schema implementation can override the -`Schema.contentsHaveChangedSince` method to tell Calcite -when it should consider its cache to be out of date. - -Tables, functions and sub-schemas explicitly created in a schema are -not affected by this caching mechanism. They always appear in the schema -immediately, and are never flushed. - -### Map Schema - -Like base class Schema, occurs within `root.schemas`. - -```json -{ - name: 'foodmart', - type: 'map', - tables: [ Table... ], - functions: [ Function... ] -} -``` - -`name`, `type`, `path`, `cache`, `materializations` inherited from -Schema. - -`tables` (optional list of Table elements) -defines the tables in this schema. - -`functions` (optional list of Function elements) -defines the functions in this schema. - -### Custom Schema - -Like base class Schema, occurs within `root.schemas`. - -```json -{ - name: 'mongo', - type: 'custom', - factory: 'org.apache.calcite.adapter.mongodb.MongoSchemaFactory', - operand: { - host: 'localhost', - database: 'test' - } -} -``` - -`name`, `type`, `path`, `cache`, `materializations` inherited from -Schema. - -`factory` (required string) is the name of the factory class for this -schema. Must implement interface `org.apache.calcite.schema.SchemaFactory` -and have a public default constructor. - -`operand` (optional map) contains attributes to be passed to the -factory. - -### JDBC Schema - -Like base class Schema, occurs within `root.schemas`. - -```json -{ - name: 'foodmart', - type: 'jdbc', - jdbcDriver: TODO, - jdbcUrl: TODO, - jdbcUser: TODO, - jdbcPassword: TODO, - jdbcCatalog: TODO, - jdbcSchema: TODO -} -``` - -`name`, `type`, `path`, `cache`, `materializations` inherited from -Schema. - -`jdbcDriver` (optional string) is the name of the JDBC driver class. It not -specified, uses whichever class the JDBC DriverManager chooses. - -`jdbcUrl` (optional string) is the JDBC connect string, for example -"jdbc:mysql://localhost/foodmart". - -`jdbcUser` (optional string) is the JDBC user name. - -`jdbcPassword` (optional string) is the JDBC password. - -`jdbcCatalog` (optional string) is the name of the initial catalog in the JDBC -data source. - -`jdbcSchema` (optional string) is the name of the initial schema in the JDBC -data source. - -### Materialization - -Occurs within `root.schemas.materializations`. - -```json -{ - view: 'V', - table: 'T', - sql: 'select deptno, count(*) as c, sum(sal) as s from emp group by deptno' -} -``` - -`view` (optional string) TODO - -`table` (optional string) TODO - -`sql` (optional string, or list of strings that will be concatenated as a - multi-line string) is the SQL definition of the materialization. - -### Table - -Occurs within `root.schemas.tables`. - -```json -{ - name: 'sales_fact', - columns: [ Column... ] -} -``` - -`name` (required string) is the name of this table. Must be unique within the schema. - -`type` (optional string, default `custom`) indicates sub-type. Values are: -* `custom` for Custom Table -* `view` for View - -`columns` (optional list of Column elements) - -### View - -Like base class Table, occurs within `root.schemas.tables`. - -```json -{ - name: 'female_emps', - type: 'view', - sql: "select * from emps where gender = 'F'", - modifiable: true -} -``` - -`name`, `type`, `columns` inherited from Table. - -`sql` (required string, or list of strings that will be concatenated as a - multi-line string) is the SQL definition of the view. - -`path` (optional list) is the SQL path to resolve the query. If not -specified, defaults to the current schema. - -`modifiable` (optional boolean) is whether the view is modifiable. -If null or not specified, Calcite deduces whether the view is modifiable. - -A view is modifiable if contains only SELECT, FROM, WHERE (no JOIN, aggregation -or sub-queries) and every column: -* is specified once in the SELECT clause; or -* occurs in the WHERE clause with a `column = literal` predicate; or -* is nullable. - -The second clause allows Calcite to automatically provide the correct value for -hidden columns. It is useful in multi-tenant environments, where the `tenantId` -column is hidden, mandatory (NOT NULL), and has a constant value for a -particular view. - -Errors regarding modifiable views: -* If a view is marked `modifiable: true` and is not modifiable, Calcite throws - an error while reading the schema. -* If you submit an INSERT, UPDATE or UPSERT command to a non-modifiable view, - Calcite throws an error when validating the statement. -* If a DML statement creates a row that would not appear in the view - (for example, a row in `female_emps`, above, with `gender = 'M'`), - Calcite throws an error when executing the statement. - -### Custom Table - -Like base class Table, occurs within `root.schemas.tables`. - -```json -{ - name: 'female_emps', - type: 'custom', - factory: 'TODO', - operand: { - todo: 'TODO' - } -} -``` - -`name`, `type`, `columns` inherited from Table. - -`factory` (required string) is the name of the factory class for this -table. Must implement interface `org.apache.calcite.schema.TableFactory` -and have a public default constructor. - -`operand` (optional map) contains attributes to be passed to the -factory. - -### Column - -Occurs within `root.schemas.tables.columns`. - -```json -{ - name: 'empno' -} -``` - -`name` (required string) is the name of this column. - -### Function - -Occurs within `root.schemas.functions`. - -```json -{ - name: 'MY_PLUS', - className: 'com.example.functions.MyPlusFunction', - methodName: 'apply', - path: [] -} -``` - -`name` (required string) is the name of this function. - -`className` (required string) is the name of the class that implements this -function. - -`methodName` (optional string) is the name of the method that implements this -function. - -`path` (optional list of string) is the path for resolving this function. - -### Lattice - -Occurs within `root.schemas.lattices`. - -```json -{ - name: 'star', - sql: [ - 'select 1 from "foodmart"."sales_fact_1997" as "s"', - 'join "foodmart"."product" as "p" using ("product_id")', - 'join "foodmart"."time_by_day" as "t" using ("time_id")', - 'join "foodmart"."product_class" as "pc" on "p"."product_class_id" = "pc"."product_class_id"' - ], - auto: false, - algorithm: true, - algorithmMaxMillis: 10000, - rowCountEstimate: 86837, - defaultMeasures: [ { - agg: 'count' - } ], - tiles: [ { - dimensions: [ 'the_year', ['t', 'quarter'] ], - measures: [ { - agg: 'sum', - args: 'unit_sales' - }, { - agg: 'sum', - args: 'store_sales' - }, { - agg: 'count' - } ] - } ] -} -``` - -`name` (required string) is the name of this lattice. - -`sql` (required string, or list of strings that will be concatenated as a -multi-line string) is the SQL statement that defines the fact table, dimension -tables, and join paths for this lattice. - -`auto` (optional boolean, default true) is whether to materialize tiles on need -as queries are executed. - -`algorithm` (optional boolean, default false) is whether to use an optimization -algorithm to suggest and populate an initial set of tiles. - -`algorithmMaxMillis` (optional long, default -1, meaning no limit) is the -maximum number of milliseconds for which to run the algorithm. After this point, -takes the best result the algorithm has come up with so far. - -`rowCountEstimate` (optional double, default 1000.0) estimated number of rows in -the star - -`tiles` (optional list of Tile elements) is a list of -materialized aggregates to create up front. - -`defaultMeasures` (optional list of Measure elements) -is a list of measures that a tile should have by default. -Any tile defined in `tiles` can still define its own measures, including -measures not on this list. If not specified, the default list of measures is -just 'count(*)': - -```json -[ { name: 'count' } ] -``` - -See also: Lattices. - -### Tile - -Occurs within `root.schemas.lattices.tiles`. - -```json -{ - dimensions: [ 'the_year', ['t', 'quarter'] ], - measures: [ { - agg: 'sum', - args: 'unit_sales' - }, { - agg: 'sum', - args: 'store_sales' - }, { - agg: 'count' - } ] -} -``` - -`dimensions` is a list of dimensions (columns from the star), like a `GROUP BY` -clause. Each element is either a string (the unique label of the column within -the star) or a string list (a column name qualified by a table name). - -`measures` (optional list of Measure elements) is a list -of aggregate functions applied to arguments. If not specified, uses the -lattice's default measure list. - -### Measure - -Occurs within `root.schemas.lattices.defaultMeasures` -and `root.schemas.lattices.tiles.measures`. - -```json -{ - agg: 'sum', - args: [ 'unit_sales' ] -} -``` - -`agg` is the name of an aggregate function (usually 'count', 'sum', 'min', -'max'). - -`args` (optional) is a column label (string), or list of zero or more columns. -If a list, each element is either a string (the unique label of the column -within the star) or a string list (a column name qualified by a table name). http://git-wip-us.apache.org/repos/asf/incubator-calcite/blob/5c049bc8/doc/reference.md ---------------------------------------------------------------------- diff --git a/doc/reference.md b/doc/reference.md deleted file mode 100644 index 03f66e3..0000000 --- a/doc/reference.md +++ /dev/null @@ -1,569 +0,0 @@ - -# Calcite SQL language reference - -## SQL constructs - -```SQL -statement: - setStatement - | explain - | insert - | update - | merge - | delete - | query - -setStatement: - ALTER ( SYSTEM | SESSION ) SET identifier = expression - -explain: - EXPLAIN PLAN - [ WITH TYPE | WITH IMPLEMENTATION | WITHOUT IMPLEMENTATION ] - [ EXCLUDING ATTRIBUTES | INCLUDING [ ALL ] ATTRIBUTES ] - FOR ( insert | update | merge | delete | query ) - -insert: - ( INSERT | UPSERT ) INTO tablePrimary - [ '(' column [, column ]* ')' ] - query - -update: - UPDATE tablePrimary - SET assign [, assign ]* - [ WHERE booleanExpression ] - -assign: - identifier '=' expression - -merge: - MERGE INTO tablePrimary [ [ AS ] alias ] - USING tablePrimary - ON booleanExpression - [ WHEN MATCHED THEN UPDATE SET assign [, assign ]* ] - [ WHEN NOT MATCHED THEN INSERT VALUES '(' value [ , value ]* ')' ] - -delete: - DELETE FROM tablePrimary [ [ AS ] alias ] - [ WHERE booleanExpression ] - -query: - [ WITH withItem [ , withItem ]* query ] - | { - select - | query UNION [ ALL ] query - | query EXCEPT query - | query INTERSECT query - } - [ ORDER BY orderItem [, orderItem ]* ] - [ LIMIT { count | ALL } ] - [ OFFSET start { ROW | ROWS } ] - [ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ] - -withItem: - name - [ '(' column [, column ]* ')' ] - AS '(' query ')' - -orderItem: - expression [ ASC | DESC ] [ NULLS FIRST | NULLS LAST ] - -select: - SELECT [ STREAM ] [ ALL | DISTINCT ] - { * | projectItem [, projectItem ]* } - FROM tableExpression - [ WHERE booleanExpression ] - [ GROUP BY { groupItem [, groupItem ]* } ] - [ HAVING booleanExpression ] - [ WINDOW windowName AS windowSpec [, windowName AS windowSpec ]* ] - -projectItem: - expression [ [ AS ] columnAlias ] - | tableAlias . * - -tableExpression: - tableReference [, tableReference ]* - | tableExpression [ NATURAL ] [ LEFT | RIGHT | FULL ] JOIN tableExpression [ joinCondition ] - -joinCondition: - ON booleanExpression - | USING '(' column [, column ]* ')' - -tableReference: - [ LATERAL ] - tablePrimary - [ [ AS ] alias [ '(' columnAlias [, columnAlias ]* ')' ] ] - -tablePrimary: - [ TABLE ] [ [ catalogName . ] schemaName . ] tableName - | '(' query ')' - | values - | UNNEST '(' expression ')' - | '(' TABLE expression ')' - -values: - VALUES expression [, expression ]* - -groupItem: - expression - | '(' ')' - | '(' expression [, expression ]* ')' - | CUBE '(' expression [, expression ]* ')' - | ROLLUP '(' expression [, expression ]* ')' - | GROUPING SETS '(' groupItem [, groupItem ]* ')' - -windowRef: - windowName - | windowSpec - -windowSpec: - [ windowName ] - '(' - [ ORDER BY orderItem [, orderItem ]* ] - [ PARTITION BY expression [, expression ]* ] - [ - RANGE numericOrIntervalExpression { PRECEDING | FOLLOWING } - | ROWS numericExpression { PRECEDING | FOLLOWING } - ] - ')' -``` - -In *merge*, at least one of the WHEN MATCHED and WHEN NOT MATCHED clauses must -be present. - -In *orderItem*, if *expression* is a positive integer *n*, it denotes -the nth item in the SELECT clause. - -An aggregate query is a query that contains a GROUP BY or a HAVING -clause, or aggregate functions in the SELECT clause. In the SELECT, -HAVING and ORDER BY clauses of an aggregate query, all expressions -must be constant within the current group (that is, grouping constants -as defined by the GROUP BY clause, or constants), or aggregate -functions, or a combination of constants and aggregate -functions. Aggregate and grouping functions may only appear in an -aggregate query, and only in a SELECT, HAVING or ORDER BY clause. - -A scalar sub-query is a sub-query used as an expression. It can occur -in most places where an expression can occur (such as the SELECT -clause, WHERE clause, or as an argument to an aggregate -function). If the sub-query returns no rows, the value is NULL; if it -returns more than one row, it is an error. - -A sub-query can occur in the FROM clause of a query and also in IN -and EXISTS expressions. A sub-query that occurs in IN and -EXISTS expressions may be correlated; that is, refer to tables in -the FROM clause of an enclosing query. - -## Identifiers - -Identifiers are the names of tables, columns and other metadata -elements used in a SQL query. - -Unquoted identifiers, such as emp, must start with a letter and can -only contain letters, digits, and underscores. They are implicitly -converted to upper case. - -Quoted identifiers, such as "Employee Name", start and end with -double quotes. They may contain virtually any character, including -spaces and other punctuation. If you wish to include a double quote -in an identifier, use another double quote to escape it, like this: -"An employee called ""Fred""." - -In Calcite, matching identifiers to the name of the referenced object is -case-sensitive. But remember that unquoted identifiers are implicitly -converted to upper case before matching, and if the object it refers -to was created using an unquoted identifier for its name, then its -name will have been converted to upper case also. - -## Data types - -### Scalar types - -| Data type | Description | Range and examples | -| ----------- | ------------------------- | ---------------------| -| BOOLEAN | Logical values | Values: TRUE, FALSE, UNKNOWN -| TINYINT | 1 byte signed integer | Range is -255 to 256 -| SMALLINT | 2 byte signed integer | Range is -32768 to 32767 -| INTEGER, INT | 4 byte signed integer | Range is -2147483648 to 2147483647 -| BIGINT | 8 byte signed integer | Range is -9223372036854775808 to 9223372036854775807 -| DECIMAL(p, s) | Fixed point | Example: 123.45 is a DECIMAL(5, 2) value. -| NUMERIC | Fixed point | -| REAL, FLOAT | 4 byte floating point | 6 decimal digits precision -| DOUBLE | 8 byte floating point | 15 decimal digits precision -| CHAR(n), CHARACTER(n) | Fixed-width character string | 'Hello', '' (empty string), _latin1'Hello', n'Hello', _UTF16'Hello', 'Hello' 'there' (literal split into multiple parts) -| VARCHAR(n), CHARACTER VARYING(n) | Variable-length character string | As CHAR(n) -| BINARY(n) | Fixed-width binary string | x'45F0AB', x'' (empty binary string), x'AB' 'CD' (multi-part binary string literal) -| VARBINARY(n), BINARY VARYING(n) | Variable-length binary string | As BINARY(n) -| DATE | Date | Example: DATE '1969-07-20' -| TIME | Time of day | Example: TIME '20:17:40' -| TIMESTAMP [ WITHOUT TIME ZONE ] | Date and time | Example: TIMESTAMP '1969-07-20 20:17:40' -| TIMESTAMP WITH TIME ZONE | Date and time with time zone | Example: TIMESTAMP '1969-07-20 20:17:40 America/Los Angeles' -| INTERVAL timeUnit [ TO timeUnit ] | Date time interval | Examples: INTERVAL '1:5' YEAR TO MONTH, INTERVAL '45' DAY -| Anchored interval | Date time interval | Example: (DATE '1969-07-20', DATE '1972-08-29') - -Where: -```SQL -timeUnit: - YEAR | MONTH | DAY | HOUR | MINUTE | SECOND -``` - -Note: -* DATE, TIME and TIMESTAMP have no time zone. There is not even an implicit - time zone, such as UTC (as in Java) or the local time zone. It is left to - the user or application to supply a time zone. - -### Non-scalar types - -| Type | Description -| -------- | ----------------------------------------------------------- -| ANY | A value of an unknown type -| ROW | Row with 1 or more columns -| MAP | Collection of keys mapped to values -| MULTISET | Unordered collection that may contain duplicates -| ARRAY | Ordered, contiguous collection that may contain duplicates -| CURSOR | Cursor over the result of executing a query - -## Operators and functions - -### Comparison operators - -| Operator syntax | Description -| ------------------------------------------------- | ----------- -| value1 = value2 | Equals -| value1 <> value2 | Not equal -| value1 > value2 | Greater than -| value1 >= value2 | Greater than or equal -| value1 < value2 | Less than -| value1 <= value2 | Less than or equal -| value IS NULL | Whether *value* is null -| value IS NOT NULL | Whether *value* is not null -| value1 IS DISTINCT FROM value2 | Whether two values are not equal, treating null values as the same -| value1 IS NOT DISTINCT FROM value2 | Whether two values are equal, treating null values as the same -| value1 BETWEEN value2 AND value3 | Whether *value1* is greater than or equal to *value2* and less than or equal to *value3* -| value1 NOT BETWEEN value2 AND value3 | Whether *value1* is less than *value2* or greater than *value3* -| string1 LIKE string2 [ ESCAPE string3 ] | Whether *string1* matches pattern *string2* -| string1 NOT LIKE string2 [ ESCAPE string3 ] | Whether *string1* does not match pattern *string2* -| string1 SIMILAR TO string2 [ ESCAPE string3 ] | Whether *string1* matches regular expression *string2* -| string1 NOT SIMILAR TO string2 [ ESCAPE string3 ] | Whether *string1* does not match regular expression *string2* -| value IN (value [, value]* ) | Whether *value* is equal to a value in a list -| value NOT IN (value [, value]* ) | Whether *value* is not equal to every value in a list -| value IN (sub-query) | Whether *value* is equal to a row returned by *sub-query* -| value NOT IN (sub-query) | Whether *value* is not equal to every row returned by *sub-query* -| EXISTS (sub-query) | Whether *sub-query* returns at least one row - -### Logical operators - -| Operator syntax | Description -| ---------------------- | ----------- -| boolean1 OR boolean2 | Whether *boolean1* is TRUE or *boolean2* is TRUE -| boolean1 AND boolean2 | Whether *boolean1* and *boolean2* are both TRUE -| NOT boolean | Whether *boolean* is not TRUE; returns UNKNOWN if *boolean* is UNKNOWN -| boolean IS FALSE | Whether *boolean* is FALSE; returns FALSE if *boolean* is UNKNOWN -| boolean IS NOT FALSE | Whether *boolean* is not FALSE; returns TRUE if *boolean* is UNKNOWN -| boolean IS TRUE | Whether *boolean* is TRUE; returns FALSE if *boolean* is UNKNOWN -| boolean IS NOT TRUE | Whether *boolean* is not TRUE; returns TRUE if *boolean* is UNKNOWN -| boolean IS UNKNOWN | Whether *boolean* is UNKNOWN -| boolean IS NOT UNKNOWN | Whether *boolean* is not UNKNOWN - -### Arithmetic operators and functions - -| Operator syntax | Description -| ------------------------- | ----------- -| + numeric | Returns *numeric* -| - numeric | Returns negative *numeric* -| numeric1 + numeric2 | Returns *numeric1* plus *numeric2* -| numeric1 - numeric2 | Returns *numeric1* minus *numeric2* -| numeric1 * numeric2 | Returns *numeric1* multiplied by *numeric2* -| numeric1 / numeric2 | Returns *numeric1* divided by *numeric2* -| POWER(numeric1, numeric2) | Returns *numeric1* raised to the power of *numeric2* -| ABS(numeric) | Returns the absolute value of *numeric* -| MOD(numeric, numeric) | Returns the remainder (modulus) of *numeric1* divided by *numeric2*. The result is negative only if *numeric1* is negative -| SQRT(numeric) | Returns the square root of *numeric* -| LN(numeric) | Returns the natural logarithm (base *e*) of *numeric* -| LOG10(numeric) | Returns the base 10 logarithm of *numeric* -| EXP(numeric) | Returns *e* raised to the power of *numeric* -| CEIL(numeric) | Rounds *numeric* up, and returns the smallest number that is greater than or equal to *numeric* -| FLOOR(numeric) | Rounds *numeric* down, and returns the largest number that is less than or equal to *numeric* - -### Character string operators and functions - -| Operator syntax | Description -| -------------------------- | ----------- -| string || string | Concatenates two character strings. -| CHAR_LENGTH(string) | Returns the number of characters in a character string -| CHARACTER_LENGTH(string) | As CHAR_LENGTH(*string*) -| UPPER(string) | Returns a character string converted to upper case -| LOWER(string) | Returns a character string converted to lower case -| POSITION(string1 IN string2) | Returns the position of the first occurrence of *string1* in *string2* -| TRIM( { BOTH ;| LEADING ;| TRAILING } string1 FROM string2) | Removes the longest string containing only the characters in *string1* from the start/end/both ends of *string1* -| OVERLAY(string1 PLACING string2 FROM integer [ FOR integer2 ]) | Replaces a substring of *string1* with *string2* -| SUBSTRING(string FROM integer) | Returns a substring of a character string starting at a given point. -| SUBSTRING(string FROM integer FOR integer) | Returns a substring of a character string starting at a given point with a given length. -| INITCAP(string) | Returns *string* with the first letter of each word converter to upper case and the rest to lower case. Words are sequences of alphanumeric characters separated by non-alphanumeric characters. - -Not implemented: -* SUBSTRING(string FROM regexp FOR regexp) - -### Binary string operators and functions - -| Operator syntax | Description -| --------------- | ----------- -| binary || binary | Concatenates two binary strings. -| POSITION(binary1 IN binary2) | Returns the position of the first occurrence of *binary1* in *binary2* -| OVERLAY(binary1 PLACING binary2 FROM integer [ FOR integer2 ]) | Replaces a substring of *binary1* with *binary2* -| SUBSTRING(binary FROM integer) | Returns a substring of *binary* starting at a given point -| SUBSTRING(binary FROM integer FOR integer) | Returns a substring of *binary* starting at a given point with a given length - -### Date/time functions - -| Operator syntax | Description -| ------------------------- | ----------- -| LOCALTIME | Returns the current date and time in the session time zone in a value of datatype TIME -| LOCALTIME(precision) | Returns the current date and time in the session time zone in a value of datatype TIME, with *precision* digits of precision -| LOCALTIMESTAMP | Returns the current date and time in the session time zone in a value of datatype TIMESTAMP -| LOCALTIMESTAMP(precision) | Returns the current date and time in the session time zone in a value of datatype TIMESTAMP, with *precision* digits of precision -| CURRENT_TIME | Returns the current time in the session time zone, in a value of datatype TIMESTAMP WITH TIME ZONE -| CURRENT_DATE | Returns the current date in the session time zone, in a value of datatype DATE -| CURRENT_TIMESTAMP | Returns the current date and time in the session time zone, in a value of datatype TIMESTAMP WITH TIME ZONE -| EXTRACT(timeUnit FROM datetime) | Extracts and returns the value of a specified datetime field from a datetime value expression -| FLOOR(datetime TO timeUnit) | Rounds *datetime* down to *timeUnit* -| CEIL(datetime TO timeUnit) | Rounds *datetime* up to *timeUnit* - -Not implemented: -* EXTRACT(timeUnit FROM interval) -* CEIL(interval) -* FLOOR(interval) -* datetime - datetime timeUnit [ TO timeUnit ] -* interval OVERLAPS interval -* + interval -* - interval -* interval + interval -* interval - interval -* interval / interval -* datetime + interval -* datetime - interval - -### System functions - -| Operator syntax | Description -| --------------- | ----------- -| USER | Equivalent to CURRENT_USER -| CURRENT_USER | User name of current execution context -| SESSION_USER | Session user name -| SYSTEM_USER | Returns the name of the current data store user as identified by the operating system -| CURRENT_PATH | Returns a character string representing the current lookup scope for references to user-defined routines and types -| CURRENT_ROLE | Returns the current active role - -### Conditional functions and operators - -| Operator syntax | Description -| --------------- | ----------- -| CASE value
WHEN value1 [, value11 ]* THEN result1
[ WHEN valueN [, valueN1 ]* THEN resultN ]*
[ ELSE resultZ ]
END | Simple case -| CASE
WHEN condition1 THEN result1
[ WHEN conditionN THEN resultN ]*
[ ELSE resultZ ]
END | Searched case -| NULLIF(value, value) | Returns NULL if the values are the same. For example, NULLIF(5, 5) returns NULL; NULLIF(5, 0) returns 5. -| COALESCE(value, value [, value]* ) | Provides a value if the first value is null. For example, COALESCE(NULL, 5) returns 5. - -### Type conversion - -| Operator syntax | Description -| --------------- | ----------- -| CAST(value AS type) | Converts a value to a given type. - -### Value constructors - -| Operator syntax | Description -| --------------- | ----------- -| ROW (value [, value]* ) | Creates a row from a list of values. -| (value [, value]* ) | Creates a row from a list of values. -| map [ key ] | Returns the element of a map with a particular key. -| array [ index ] | Returns the element at a particular location in an array. -| ARRAY [ value [, value ]* ] | Creates an array from a list of values. -| MAP [ key, value [, key, value ]* ] | Creates a map from a list of key-value pairs. - -### Collection functions - -| Operator syntax | Description -| --------------- | ----------- -| ELEMENT(value) | Returns the sole element of a array or multiset; null if the collection is empty; throws if it has more than one element. -| CARDINALITY(value) | Returns the number of elements in an array or multiset. - -See also: UNNEST relational operator converts a collection to a relation. - -### JDBC function escape - -#### Numeric - -| Operator syntax | Description -| ------------------------------ | ----------- -| {fn LOG10(numeric)} | Returns the base-10 logarithm of *numeric* -| {fn POWER(numeric1, numeric2)} | Returns *numeric1* raised to the power of *numeric2* - -Not implemented: -* {fn ABS(numeric)} - Returns the absolute value of *numeric* -* {fn ACOS(numeric)} - Returns the arc cosine of *numeric* -* {fn ASIN(numeric)} - Returns the arc sine of *numeric* -* {fn ATAN(numeric)} - Returns the arc tangent of *numeric* -* {fn ATAN2(numeric, numeric)} -* {fn CEILING(numeric)} - Rounds *numeric* up, and returns the smallest number that is greater than or equal to *numeric* -* {fn COS(numeric)} - Returns the cosine of *numeric* -* {fn COT(numeric)} -* {fn DEGREES(numeric)} - Converts *numeric* from radians to degrees -* {fn EXP(numeric)} - Returns *e* raised to the power of *numeric* -* {fn FLOOR(numeric)} - Rounds *numeric* down, and returns the largest number that is less than or equal to *numeric* -* {fn LOG(numeric)} - Returns the natural logarithm (base *e*) of *numeric* -* {fn MOD(numeric1, numeric2)} - Returns the remainder (modulus) of *numeric1* divided by *numeric2*. The result is negative only if *numeric1* is negative -* {fn PI()} - Returns a value that is closer than any other value to *pi* -* {fn RADIANS(numeric)} - Converts *numeric* from degrees to radians -* {fn RAND(numeric)} -* {fn ROUND(numeric, numeric)} -* {fn SIGN(numeric)} -* {fn SIN(numeric)} - Returns the sine of *numeric* -* {fn SQRT(numeric)} - Returns the square root of *numeric* -* {fn TAN(numeric)} - Returns the tangent of *numeric* -* {fn TRUNCATE(numeric, numeric)} - -#### String - -| Operator syntax | Description -| --------------- | ----------- -| {fn LOCATE(string1, string2)} | Returns the position in *string2* of the first occurrence of *string1*. Searches from the beginning of the second CharacterExpression, unless the startIndex parameter is specified. -| {fn INSERT(string1, start, length, string2)} | Inserts *string2* into a slot in *string1* -| {fn LCASE(string)} | Returns a string in which all alphabetic characters in *string* have been converted to lower case - -Not implemented: -* {fn ASCII(string)} - Convert a single-character string to the corresponding ASCII code, an integer between 0 and 255 -* {fn CHAR(string)} -* {fn CONCAT(character, character)} - Returns the concatenation of character strings -* {fn DIFFERENCE(string, string)} -* {fn LEFT(string, integer)} -* {fn LENGTH(string)} -* {fn LOCATE(string1, string2 [, integer])} - Returns the position in *string2* of the first occurrence of *string1*. Searches from the beginning of *string2*, unless *integer* is specified. -* {fn LTRIM(string)} -* {fn REPEAT(string, integer)} -* {fn REPLACE(string, string, string)} -* {fn RIGHT(string, integer)} -* {fn RTRIM(string)} -* {fn SOUNDEX(string)} -* {fn SPACE(integer)} -* {fn SUBSTRING(string, integer, integer)} -* {fn UCASE(string)} - Returns a string in which all alphabetic characters in *string* have been converted to upper case - -#### Date/time - -Not implemented: -* {fn CURDATE()} -* {fn CURTIME()} -* {fn DAYNAME(date)} -* {fn DAYOFMONTH(date)} -* {fn DAYOFWEEK(date)} -* {fn DAYOFYEAR(date)} -* {fn HOUR(time)} -* {fn MINUTE(time)} -* {fn MONTH(date)} -* {fn MONTHNAME(date)} -* {fn NOW()} -* {fn QUARTER(date)} -* {fn SECOND(time)} -* {fn TIMESTAMPADD(interval, count, timestamp)} -* {fn TIMESTAMPDIFF(interval, timestamp, timestamp)} -* {fn WEEK(date)} -* {fn YEAR(date)} - -#### System - -Not implemented: -* {fn DATABASE()} -* {fn IFNULL(value, value)} -* {fn USER(value, value)} -* {fn CONVERT(value, type)} - -### Aggregate functions - -Syntax: - -```SQL -aggregateCall: - agg( [ DISTINCT ] value [, value]* ) [ FILTER ( WHERE condition ) ] - | agg(*) [ FILTER ( WHERE condition ) ] -``` - -If `FILTER` is present, the aggregate function only considers rows for which -*condition* evaluates to TRUE. - -If `DISTINCT` is present, duplicate argument values are eliminated before being -passed to the aggregate function. - -| Operator syntax | Description -| ---------------------------------- | ----------- -| COUNT( [ DISTINCT ] value [, value]* ) | Returns the number of input rows for which *value* is not null (wholly not null if *value* is composite) -| COUNT(*) | Returns the number of input rows -| AVG( [ DISTINCT ] numeric) | Returns the average (arithmetic mean) of *numeric* across all input values -| SUM( [ DISTINCT ] numeric) | Returns the sum of *numeric* across all input values -| MAX( [ DISTINCT ] value) | Returns the maximum value of *value* across all input values -| MIN( [ DISTINCT ] value) | Returns the minimum value of *value* across all input values -| STDDEV_POP( [ DISTINCT ] numeric) | Returns the population standard deviation of *numeric* across all input values -| STDDEV_SAMP( [ DISTINCT ] numeric) | Returns the sample standard deviation of *numeric* across all input values -| VAR_POP( [ DISTINCT ] value) | Returns the population variance (square of the population standard deviation) of *numeric* across all input values -| VAR_SAMP( [ DISTINCT ] numeric) | Returns the sample variance (square of the sample standard deviation) of *numeric* across all input values -| COVAR_POP(numeric1, numeric2) | Returns the population covariance of the pair (*numeric1*, *numeric2*) across all input values -| COVAR_SAMP(numeric1, numeric2) | Returns the sample covariance of the pair (*numeric1*, *numeric2*) across all input values -| REGR_SXX(numeric1, numeric2) | Returns the sum of squares of the dependent expression in a linear regression model -| REGR_SYY(numeric1, numeric2) | Returns the sum of squares of the independent expression in a linear regression model - -Not implemented: -* REGR_AVGX(numeric1, numeric2) -* REGR_AVGY(numeric1, numeric2) -* REGR_COUNT(numeric1, numeric2) -* REGR_INTERCEPT(numeric1, numeric2) -* REGR_R2(numeric1, numeric2) -* REGR_SLOPE(numeric1, numeric2) -* REGR_SXY(numeric1, numeric2) - -### Window functions - -| Operator syntax | Description -| ----------------------------------------- | ----------- -| COUNT(value [, value ]* ) OVER window | Returns the number of rows in *window* for which *value* is not null (wholly not null if *value* is composite) -| COUNT(*) OVER window | Returns the number of rows in *window* -| AVG(numeric) OVER window | Returns the average (arithmetic mean) of *numeric* across all values in *window* -| SUM(numeric) OVER window | Returns the sum of *numeric* across all values in *window* -| MAX(value) OVER window | Returns the maximum value of *value* across all values in *window* -| MIN(value) OVER window | Returns the minimum value of *value* across all values in *window* -| RANK() OVER window | Returns the rank of the current row with gaps; same as ROW_NUMBER of its first peer -| DENSE_RANK() OVER window | Returns the rank of the current row without gaps; this function counts peer groups -| ROW_NUMBER() OVER window | Returns the number of the current row within its partition, counting from 1 -| FIRST_VALUE(value) OVER window | Returns *value* evaluated at the row that is the first row of the window frame -| LAST_VALUE(value) OVER window | Returns *value* evaluated at the row that is the last row of the window frame -| LEAD(value, offset, default) OVER window | Returns *value* evaluated at the row that is *offset* rows after the current row within the partition; if there is no such row, instead returns *default*. Both *offset* and *default* are evaluated with respect to the current row. If omitted, *offset* defaults to 1 and *default* to NULL -| LAG(value, offset, default) OVER window | Returns *value* evaluated at the row that is *offset* rows before the current row within the partition; if there is no such row, instead returns *default*. Both *offset* and *default* are evaluated with respect to the current row. If omitted, *offset* defaults to 1 and *default* to NULL -| NTILE(value) OVER window | Returns an integer ranging from 1 to *value*, dividing the partition as equally as possible - -Not implemented: -* COUNT(DISTINCT value) OVER window -* FIRST_VALUE(value) IGNORE NULLS OVER window -* LAST_VALUE(value) IGNORE NULLS OVER window -* PERCENT_RANK(value) OVER window -* CUME_DIST(value) OVER window -* NTH_VALUE(value, nth) OVER window - -### Grouping functions - -| Operator syntax | Description -| -------------------- | ----------- -| GROUPING(expression) | Returns 1 if expression is rolled up in the current row's grouping set, 0 otherwise -| GROUP_ID() | Returns an integer that uniquely identifies the combination of grouping keys -| GROUPING_ID(expression [, expression ] * ) | Returns a bit vector of the given grouping expressions