Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E4F1B1830A for ; Thu, 9 Jul 2015 16:56:54 +0000 (UTC) Received: (qmail 30610 invoked by uid 500); 9 Jul 2015 16:56:54 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 30579 invoked by uid 500); 9 Jul 2015 16:56:54 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 30568 invoked by uid 99); 9 Jul 2015 16:56:54 -0000 Received: from eris.apache.org (HELO hades.apache.org) (140.211.11.105) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Jul 2015 16:56:54 +0000 Received: from hades.apache.org (localhost [127.0.0.1]) by hades.apache.org (ASF Mail Server at hades.apache.org) with ESMTP id 9368AAC0051 for ; Thu, 9 Jul 2015 16:56:54 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: svn commit: r1690127 - /cassandra/site/publish/doc/cql3/CQL-2.2.html Date: Thu, 09 Jul 2015 16:56:54 -0000 To: commits@cassandra.apache.org From: tylerhobbs@apache.org X-Mailer: svnmailer-1.0.9 Message-Id: <20150709165654.9368AAC0051@hades.apache.org> Author: tylerhobbs Date: Thu Jul 9 16:56:54 2015 New Revision: 1690127 URL: http://svn.apache.org/r1690127 Log: Update JSON docs for Cassandra 2.2 Modified: cassandra/site/publish/doc/cql3/CQL-2.2.html Modified: cassandra/site/publish/doc/cql3/CQL-2.2.html URL: http://svn.apache.org/viewvc/cassandra/site/publish/doc/cql3/CQL-2.2.html?rev=1690127&r1=1690126&r2=1690127&view=diff ============================================================================== --- cassandra/site/publish/doc/cql3/CQL-2.2.html (original) +++ cassandra/site/publish/doc/cql3/CQL-2.2.html Thu Jul 9 16:56:54 2015 @@ -1,4 +1,4 @@ -CQL-2.2

Cassandra Query Language (CQL) v3.3.0

  1. Cassandra Query Language (CQL) v3.3.0
    1. CQL Syntax
      1. Preamble
      2. Conventions
      3. Identifiers and keywords
      4. Constants
      5. Comments
      6. Statements
      7. Prepared Statement
    2. Data Definition
      1. CREATE KEYSPACE
      2. USE
      3. ALTER KEYSPACE
      4. DROP KEYSPACE
      5. CREATE TABLE
      6. ALTER TABLE
      7. DROP TABLE
      8. TRUNCATE
      9. CREATE INDEX
      10. DROP INDEX
      11. CREATE TYPE
      12. ALTER TYPE
      13. DROP TYPE
      14. CREATE TRIGGER
      15. DROP TRIGGER
      16. CREATE FUNCTION
      17. DROP FUNCTION
      18. CREATE AGGREGATE
      19. DROP AGGREGATE
    3. Data Manipulation
      1. INSERT
      2. UPDATE
      3. DELETE
      4. BATCH
    4. Queries
      1. SELECT
    5. Database Roles< ol style="list-style: none;">
    6. CREATE ROLE
    7. ALTER ROLE
    8. DROP ROLE
    9. GRANT ROLE
    10. REVOKE ROLE
    11. CREATE USER
    12. ALTER USER
    13. DROP USER
    14. LIST USERS
  2. Data Control
    1. Permissions
    2. GRANT PERMISSION
    3. REVOKE PERMISSION
  3. Data Types
    1. Working with timestamps
    2. Working with dates
    3. Working with time
    4. Counters
    5. Working with collections
  4. Functions
    1. Token
    2. Uuid
    3. Timeuuid functions
    4. Time conversion functions
    5. Blob conversion functions
  5. User-Defined Functions
  6. User-Defined Aggregates
  7. JSON Support
    1. SELECT JSON
    2. INSERT JSON
    3. JSON Encoding of Cassandra Data Types
    4. The fromJson() Function
    5. The toJson() Function
  8. Appendix A: CQL Keywords
  9. Appendix B: CQL Reserved Types
  10. Changes
    1. 3.3.0
    2. 3.2.0
    3. 3.1.7
    4. 3.1.6
    5. 3.1.5
    6. 3.1.4
    7. 3.1.3
    8. 3.1.2
    9. 3.1.1
    10. 3.1.0
    11. 3.0.5
    12. 3.0.4
    13. 3.0.3
    14. 3.0.2
    15. 3.0.1
  11. Versioning

CQL Syntax

Preamble

This document describes the Cassandra Query Language (CQL) version 3. CQL v3 is not backward compatible with CQL v2 and differs from it in numerous ways. Note that this document describes the last version of the languages. However, the changes section provides the diff between the different versions of CQL v3.

CQL v3 offers a model very close to SQL in the sense that data is put in tables containing rows of columns. For that reason, when used in this document, these terms (tables, rows and columns) have the same definition than they have in SQL. But please note that as such, they do not refer to the concept of rows and columns found in the internal implementation of Cassandra and in the thrift and CQL v2 API.

Conventions

To aid in specifying the CQL syntax, we will use the following conventions in this document:

  • Language rules will be given in a BNF -like notation:
<start> ::= TERMINAL <non-terminal1> <non-terminal1>
+CQL-2.2

Cassandra Query Language (CQL) v3.3.0

  1. Cassandra Query Language (CQL) v3.3.0
    1. CQL Syntax
      1. Preamble
      2. Conventions
      3. Identifiers and keywords
      4. Constants
      5. Comments
      6. Statements
      7. Prepared Statement
    2. Data Definition
      1. CREATE KEYSPACE
      2. USE
      3. ALTER KEYSPACE
      4. DROP KEYSPACE
      5. CREATE TABLE
      6. ALTER TABLE
      7. DROP TABLE
      8. TRUNCATE
      9. CREATE INDEX
      10. DROP INDEX
      11. CREATE TYPE
      12. ALTER TYPE
      13. DROP TYPE
      14. CREATE TRIGGER
      15. DROP TRIGGER
      16. CREATE FUNCTION
      17. DROP FUNCTION
      18. CREATE AGGREGATE
      19. DROP AGGREGATE
    3. Data Manipulation
      1. INSERT
      2. UPDATE
      3. DELETE
      4. BATCH
    4. Queries
      1. SELECT
    5. Database Roles< ol style="list-style: none;">
    6. CREATE ROLE
    7. ALTER ROLE
    8. DROP ROLE
    9. GRANT ROLE
    10. REVOKE ROLE
    11. CREATE USER
    12. ALTER USER
    13. DROP USER
    14. LIST USERS
  2. Data Control
    1. Permissions
    2. GRANT PERMISSION
    3. REVOKE PERMISSION
  3. Data Types
    1. Working with timestamps
    2. Working with dates
    3. Working with time
    4. Counters
    5. Working with collections
  4. Functions
    1. Token
    2. Uuid
    3. Timeuuid functions
    4. Time conversion functions
    5. Blob conversion functions
  5. Aggregates
    1. Count
    2. Max and Min
    3. Sum
    4. Avg
  6. User-Defined Functions
  7. User-Defined Aggregates
  8. JSON Support
    1. SELECT JSON
    2. INSERT JSON
    3. JSON Encoding of Cassandra Data Types
    4. The fromJson() Function
    5. The toJson() Function
  9. Appendix A: CQL Keywords
  10. Appendix B: CQL Reserved Types
  11. Changes
    1. 3.3.0
    2. 3.2.0
    3. 3.1.7
    4. 3.1.6
    5. 3.1.5
    6. 3.1.4
    7. 3.1.3
    8. 3.1.2
    9. 3.1.1
    10. 3.1.0
    11. 3.0.5
    12. 3.0.4
    13. 3.0.3
    14. 3.0.2
    15. 3.0.1
  12. Versioning

CQL Syntax

Preamble

This document describes the Cassandra Query Language (CQL) version 3. CQL v3 is not backward compatible with CQL v2 and differs from it in numerous ways. Note that this document describes the last version of the languages. However, the changes section provides the diff between the different versions of CQL v3.

CQL v3 offers a model very close to SQL in the sense that data is put in tables containing rows of columns. For that reason, when used in this document, these terms (tables, rows and columns) have the same definition than they have in SQL. But please note that as such, they do not refer to the concept of rows and columns found in the internal implementation of Cassandra and in the thrift and CQL v2 API.

Conventions

To aid in specifying the CQL syntax, we will use the following conventions in this document:

  • Language rules will be given in a BNF -like notation:
<start> ::= TERMINAL <non-terminal1> <non-terminal1>
 
  • Nonterminal symbols will have <angle brackets>.
  • As additional shortcut notations to BNF, we’ll use traditional regular expression’s symbols (?, + and *) to signify that a given symbol is optional and/or can be repeated. We’ll also allow parentheses to group symbols and the [<characters>] notation to represent any one of <characters>.
  • The grammar is provided for documentation purposes and leave some minor details out. For instance, the last column definition in a CREATE TABLE statement is optional but supported if present even though the provided grammar in this document suggest it is not supported.
  • Sample code will be provided in a code block:
SELECT sample_usage FROM cql;
 
  • References to keywords or pieces of CQL code in running text will be shown in a fixed-width font.

Identifiers and keywords

The CQL language uses identifiers (or names) to identify tables, columns and other objects. An identifier is a token matching the regular expression [a-zA-Z][a-zA-Z0-9_]*.

A number of such identifiers, like SELECT or WITH, are keywords. They have a fixed meaning for the language and most are reserved. The list of those keywords can be found in Appendix A.

Identifiers and (unquoted) keywords are case insensitive. Thus SELECT is the same than select or sElEcT, and myId is the same than myid or MYID for instance. A convention often used (in particular by the samples of this documentation) is t o use upper case for keywords and lower case for other identifiers.

There is a second kind of identifiers called quoted identifiers defined by enclosing an arbitrary sequence of characters in double-quotes("). Quoted identifiers are never keywords. Thus "select" is not a reserved keyword and can be used to refer to a column, while select would raise a parse error. Also, contrarily to unquoted identifiers and keywords, quoted identifiers are case sensitive ("My Quoted Id" is different from "my quoted id"). A fully lowercase quoted identifier that matches [a-zA-Z][a-zA-Z0-9_]* is equivalent to the unquoted identifier obtained by removing the double-quote (so "myid" is equivalent to myid and to myId but different from "myId"). Inside a quoted identifier, the double-quote character can be repeated to escape it , so "foo "" bar" is a valid identifier.

Constants

CQL defines the following kind of constants: strings, integers, floats, booleans, uuids and blobs:

  • A string constant is an arbitrary sequence of characters characters enclosed by single-quote('). One can include a single-quote in a string by repeating it, e.g. 'It''s raining today'. Those are not to be confused with quoted identifiers that use double-quotes.
  • An integer constant is defined by '-'?[0-9]+.
  • A float constant is defined by '-'?[0-9]+('.'[0-9]*)?([eE][+-]?[0-9+])?. On top of that, NaN and Infinity are also float constants.
  • A boolean constant is either true or false up to case-insensitivity (i.e. True is a valid boolean constant).
  • A UUID constan t is defined by hex{8}-hex{4}-hex{4}-hex{4}-hex{12} where hex is an hexadecimal character, e.g. [0-9a-fA-F] and {4} is the number of such characters.
  • A blob constant is an hexadecimal number defined by 0[xX](hex)+ where hex is an hexadecimal character, e.g. [0-9a-fA-F].

For how these constants are typed, see the data types section.

Comments

A comment in CQL is a line beginning by either double dashes (--) or double slash (//).

Multi-line comments are also supported through enclosure within /* and */ (but nesting is not supported).

-- This is a comment
 // This is a comment too
@@ -104,7 +104,7 @@ CREATE TABLE timeline (
 INSERT INTO test(pk, t, v, s) VALUES (0, 0, 'val0', 'static0');
 INSERT INTO test(pk, t, v, s) VALUES (0, 1, 'val1', 'static1');
 SELECT * FROM test WHERE pk=0 AND t=0;
-

the last query will return 'static1' as value for s, since s is static and thus the 2nd insertion modified this “shared” value. Note however that static columns are only static within a given partition, and if in the example above both rows where from different partitions (i.e. if they had different value for pk), then the 2nd insertion would not have modified the value of s for the first row.

A few restrictions applies to when static columns are allowed:

  • tables with the COMPACT STORAGE option (see below) cannot have them
  • a table without clustering columns cannot have static columns (in a table without clustering columns, every partition has only one row, and so every column is inherently static).
  • only non PRIMARY KEY columns can be static

<option>

The CREATE TABLE statement supports a number of options that controls the configuration of a new table. These options can be specified after the WITH keyword.

The first of these option is COMPACT STORAGE. This option is mainly targeted towards backward compatibility for definitions created before CQL3 (see www.datastax.com/dev/blog/thrift-to-cql3 for more details). The option also provides a slightly more compact layout of data on disk but at the price of diminished flexibility and extensibility for the table. Most notably, COMPACT STORAGE tables cannot have collections nor static columns and a COMPACT STORAGE table with at least one clustering column supports exactly one (as in not 0 nor more than 1) column not part of the PRIMARY KEY definition (which imply in particular that you cannot add nor remove columns after creation). For those reasons, COMPACT STO RAGE is not recommended outside of the backward compatibility reason evoked above.

Another option is CLUSTERING ORDER. It allows to define the ordering of rows on disk. It takes the list of the clustering column names with, for each of them, the on-disk order (Ascending or descending). Note that this option affects what ORDER BY are allowed during SELECT.

Table creation supports the following other <property>:

option kind default description
comment simple none A free-form, human-readable comment.
read_repair_chance simple 0.1 The probability with which to query extra nodes (e.g. more nodes than required by the consistency level) for the purpos e of read repairs.
dclocal_read_repair_chance simple 0 The probability with which to query extra nodes (e.g. more nodes than required by the consistency level) belonging to the same data center than the read coordinator for the purpose of read repairs.
gc_grace_seconds simple 864000 Time to wait before garbage collecting tombstones (deletion markers).
bloom_filter_fp_chance simple 0.00075 The target probability of false positive of the sstable bloom filters. Said bloom filters will be sized to provide the provided probability (thus lowering this value impact the size of bloom filters in-memory and on-disk)
compaction map see below The compaction options to use, se e below.
compression map see below Compression options, see below.
caching simple keys_only Whether to cache keys (“key cache”) and/or rows (“row cache”) for this table. Valid values are: all, keys_only, rows_only and none.
default_time_to_live simple 0 The default expiration time (“TTL”) in seconds for a table.

compaction options

The compaction property must at least define the 'class' sub-option, that defines the compaction strategy class to use. The default supported class are 'SizeTieredCompactionStrategy' and 'LeveledCompacti onStrategy'. Custom strategy can be provided by specifying the full class name as a string constant. The rest of the sub-options depends on the chosen class. The sub-options supported by the default classes are:

option supported compaction strategy default description
enabled all true A boolean denoting whether compaction should be enabled or not.
tombstone_threshold all 0.2 A ratio such that if a sstable has more than this ratio of gcable tombstones over all contained columns, the sstable will be compacted (with no other sstables) for the purpose of purging those tombstones.
tombstone_compaction_interval all 1 day The minimum time to wait after an sstable creation time before considering it for “tombstone compaction”, where “tombstone compaction” is the compaction triggered if the sstable has more gcable tombstones than tombstone_threshold.
unchecked_tombstone_compaction all false Setting this to true enables more aggressive tombstone compactions – single sstable tombstone compactions will run without checking how likely it is that they will be successful.
min_sstable_size SizeTieredCompactionStrategy 50MB The size tiered strategy groups SSTables to compact in buckets. A bucket groups SSTables that differs from less than 50% in size. However, for small sizes, this would result in a bucketing that is too fine grained. min_sstable_size defines a size threshold (in bytes) below which all SSTables belong to one unique bucket
min_threshold SizeTieredCompactionStrategy 4 Minimum number of SSTables needed to start a minor compaction.
max_threshold SizeTieredCompactionStrategy 32 Maximum number of SSTables processed by one minor compaction.
bucket_low SizeTieredCompactionStrategy 0.5 Size tiered consider sstables to be within the same bucket if their size is within [average_size * bucket_low, average_size * bucket_high ] (i.e the default groups sstable whose sizes diverges by at most 50%)
bucket_high SizeTieredCompactionStrategy 1.5 Siz e tiered consider sstables to be within the same bucket if their size is within [average_size * bucket_low, average_size * bucket_high ] (i.e the default groups sstable whose sizes diverges by at most 50%).
sstable_size_in_mb LeveledCompactionStrategy 5MB The target size (in MB) for sstables in the leveled strategy. Note that while sstable sizes should stay less or equal to sstable_size_in_mb, it is possible to exceptionally have a larger sstable as during compaction, data for a given partition key are never split into 2 sstables

For the compression property, the following default sub-options are available:

option default description
sstable_compression LZ4Compressor The compression algorithm to use. Default compressor are: LZ 4Compressor, SnappyCompressor and DeflateCompressor. Use an empty string ('') to disable compression. Custom compressor can be provided by specifying the full class name as a string constant.
chunk_length_kb 64KB On disk SSTables are compressed by block (to allow random reads). This defines the size (in KB) of said block. Bigger values may improve the compression rate, but increases the minimum size of data to be read from disk for a read
crc_check_chance 1.0 When compression is enabled, each compressed block includes a checksum of that block for the purpose of detecting disk bitrot and avoiding the propagation of corruption to other replica. This option defines the probability with which those checksums are checked during read. By default they are always checked. Set to 0 to disable checksum checking and to 0.5 for in stance to check them every other read

Other considerations:

  • When inserting a given row, not all columns needs to be defined (except for those part of the key), and missing columns occupy no space on disk. Furthermore, adding new columns (see <a href=#alterStmt>ALTER TABLE) is a constant time operation. There is thus no need to try to anticipate future usage (or to cry when you haven’t) when creating a table.

ALTER TABLE

Syntax:

<alter-table-stmt> ::= ALTER (TABLE | COLUMNFAMILY) <tablename> <instruction>
+

the last query will return 'static1' as value for s, since s is static and thus the 2nd insertion modified this “shared” value. Note however that static columns are only static within a given partition, and if in the example above both rows where from different partitions (i.e. if they had different value for pk), then the 2nd insertion would not have modified the value of s for the first row.

A few restrictions applies to when static columns are allowed:

  • tables with the COMPACT STORAGE option (see below) cannot have them
  • a table without clustering columns cannot have static columns (in a table without clustering columns, every partition has only one row, and so every column is inherently static).
  • only non PRIMARY KEY columns can be static

<option>

The CREATE TABLE statement supports a number of options that controls the configuration of a new table. These options can be specified after the WITH keyword.

The first of these option is COMPACT STORAGE. This option is mainly targeted towards backward compatibility for definitions created before CQL3 (see www.datastax.com/dev/blog/thrift-to-cql3 for more details). The option also provides a slightly more compact layout of data on disk but at the price of diminished flexibility and extensibility for the table. Most notably, COMPACT STORAGE tables cannot have collections nor static columns and a COMPACT STORAGE table with at least one clustering column supports exactly one (as in not 0 nor more than 1) column not part of the PRIMARY KEY definition (which imply in particular that you cannot add nor remove columns after creation). For those reasons, COMPACT STO RAGE is not recommended outside of the backward compatibility reason evoked above.

Another option is CLUSTERING ORDER. It allows to define the ordering of rows on disk. It takes the list of the clustering column names with, for each of them, the on-disk order (Ascending or descending). Note that this option affects what ORDER BY are allowed during SELECT.

Table creation supports the following other <property>:

option kind default description
comment simple none A free-form, human-readable comment.
read_repair_chance simple 0.1 The probability with which to query extra nodes (e.g. more nodes than required by the consistency level) for the purpos e of read repairs.
dclocal_read_repair_chance simple 0 The probability with which to query extra nodes (e.g. more nodes than required by the consistency level) belonging to the same data center than the read coordinator for the purpose of read repairs.
gc_grace_seconds simple 864000 Time to wait before garbage collecting tombstones (deletion markers).
bloom_filter_fp_chance simple 0.00075 The target probability of false positive of the sstable bloom filters. Said bloom filters will be sized to provide the provided probability (thus lowering this value impact the size of bloom filters in-memory and on-disk)
compaction map see below The compaction options to use, se e below.
compression map see below Compression options, see below.
caching simple keys_only Whether to cache keys (“key cache”) and/or rows (“row cache”) for this table. Valid values are: all, keys_only, rows_only and none.
default_time_to_live simple 0 The default expiration time (“TTL”) in seconds for a table.

compaction options

The compaction property must at least define the 'class' sub-option, that defines the compaction strategy class to use. The default supported class are 'SizeTieredCompactionStrategy' and 'LeveledCompacti onStrategy'. Custom strategy can be provided by specifying the full class name as a string constant. The rest of the sub-options depends on the chosen class. The sub-options supported by the default classes are:

option supported compaction strategy default description
enabled all true A boolean denoting whether compaction should be enabled or not.
tombstone_threshold all 0.2 A ratio such that if a sstable has more than this ratio of gcable tombstones over all contained columns, the sstable will be compacted (with no other sstables) for the purpose of purging those tombstones.
tombstone_compaction_interval all 1 day The minimum time to wait after an sstable creation time before considering it for “tombstone compaction”, where “tombstone compaction” is the compaction triggered if the sstable has more gcable tombstones than tombstone_threshold.
unchecked_tombstone_compaction all false Setting this to true enables more aggressive tombstone compactions – single sstable tombstone compactions will run without checking how likely it is that they will be successful.
min_sstable_size SizeTieredCompactionStrategy 50MB The size tiered strategy groups SSTables to compact in buckets. A bucket groups SSTables that differs from less than 50% in size. However, for small sizes, this would result in a bucketing that is too fine grained. min_sstable_size defines a size threshold (in bytes) below which all SSTables belong to one unique bucket
min_threshold SizeTieredCompactionStrategy 4 Minimum number of SSTables needed to start a minor compaction.
max_threshold SizeTieredCompactionStrategy 32 Maximum number of SSTables processed by one minor compaction.
bucket_low SizeTieredCompactionStrategy 0.5 Size tiered consider sstables to be within the same bucket if their size is within [average_size * bucket_low, average_size * bucket_high ] (i.e the default groups sstable whose sizes diverges by at most 50%)
bucket_high SizeTieredCompactionStrategy 1.5 Siz e tiered consider sstables to be within the same bucket if their size is within [average_size * bucket_low, average_size * bucket_high ] (i.e the default groups sstable whose sizes diverges by at most 50%).
sstable_size_in_mb LeveledCompactionStrategy 5MB The target size (in MB) for sstables in the leveled strategy. Note that while sstable sizes should stay less or equal to sstable_size_in_mb, it is possible to exceptionally have a larger sstable as during compaction, data for a given partition key are never split into 2 sstables

For the compression property, the following default sub-options are available:

option default description
sstable_compression LZ4Compressor The compression algorithm to use. Default compressor are: LZ 4Compressor, SnappyCompressor and DeflateCompressor. Use an empty string ('') to disable compression. Custom compressor can be provided by specifying the full class name as a string constant.
chunk_length_kb 64KB On disk SSTables are compressed by block (to allow random reads). This defines the size (in KB) of said block. Bigger values may improve the compression rate, but increases the minimum size of data to be read from disk for a read
crc_check_chance 1.0 When compression is enabled, each compressed block includes a checksum of that block for the purpose of detecting disk bitrot and avoiding the propagation of corruption to other replica. This option defines the probability with which those checksums are checked during read. By default they are always checked. Set to 0 to disable checksum checking and to 0.5 for in stance to check them every other read

Other considerations:

  • When inserting / updating a given row, not all columns needs to be defined (except for those part of the key), and missing columns occupy no space on disk. Furthermore, adding new columns (see <a href=#alterStmt>ALTER TABLE) is a constant time operation. There is thus no need to try to anticipate future usage (or to cry when you haven’t) when creating a table.

ALTER TABLE

Syntax:

<alter-table-stmt> ::= ALTER (TABLE | COLUMNFAMILY) <tablename> <instruction>
 
 <instruction> ::= ALTER <identifier> TYPE <type>
                 | ADD   <identifier> <type>
@@ -208,7 +208,7 @@ CREATE FUNCTION akeyspace.fname IF NOT E
 DROP FUNCTION mykeyspace.afunction;
 DROP FUNCTION afunction ( int );
 DROP FUNCTION afunction ( text );
-

DROP FUNCTION statement removes a function created using CREATE FUNCTION.
You must specify the argument types (signature) of the function to drop if there are multiple functions with the same name but a different signature (overloaded functions).

DROP FUNCTION with the optional IF EXISTS keywords drops a function if it exists.

CREATE AGGREGATE

Syntax:

<create-aggregate-stmt> ::= CREATE ( OR REPLACE )? 
+

DROP FUNCTION statement removes a function created using CREATE FUNCTION.
You must specify the argument types (signature ) of the function to drop if there are multiple functions with the same name but a different signature (overloaded functions).

DROP FUNCTION with the optional IF EXISTS keywords drops a function if it exists.

CREATE AGGREGATE

Syntax:

<create-aggregate-stmt> ::= CREATE ( OR REPLACE )? 
                             AGGREGATE ( IF NOT EXISTS )?
                             ( <keyspace> '.' )? <aggregate-name>
                             '(' <arg-type> ( ',' <arg-type> )* ')'
@@ -374,7 +374,7 @@ SELECT entry_title, content FROM posts W
 

Moreover, the IN relation is only allowed on the last column of the partition key and on the last column of the full primary key.

It is also possible to “group” CLUSTERING COLUMNS together in a relation using the tuple notation. For instance:

SELECT * FROM posts WHERE userid='john doe' AND (blog_title, posted_at) > ('John''s Blog', '2012-01-01')
 

will request all rows that sorts after the one having “John's Blog” as blog_tile and ‘2012-01-01’ for posted_at in the clustering order. In particular, rows having a post_at <= '2012-01-01' will be returned as long as their blog_title > 'John''s Blog', which wouldn’t be the case for:

SELECT * FROM posts WHERE userid='john doe' AND blog_title > 'John''s Blog' AND posted_at > '2012-01-01'
 

The tuple notation may also be used for IN clauses on CLUSTERING COLUMNS:

SELECT * FROM posts WHERE userid='john doe' AND (blog_title, posted_at) IN (('John''s Blog', '2012-01-01), ('Extreme Chess', '2014-06-01'))
-

The CONTAINS operator may only be used on collection columns (lists, sets, and maps). In the case of maps, CONTAINS applies to the map values. The CONTAINS KEY operator may only be used on map columns and applies to the map keys.

<order-by>

The ORDER BY option allows to select the order of the returned results. It takes as argument a list of column names along with the order for the column (ASC for ascendant and DESC for descendant, omitting the order being equivalent to ASC). Currently the possible orderings are limited (which depends on the table CLUSTERING ORDER):

  • if the table has been defined without any specific CLUSTERING ORDER, then then allowed orderings are the order induced by the clustering columns and the reverse of that one.
  • otherwise, the orderings allowed are the order of the CLUSTERING ORDER option and the reversed one.

LIMIT

The LIMIT option to a SELECT statement limits the number of rows returned by a query.

ALLOW FILTERING

By default, CQL only allows select queries that don’t involve “filtering” server side, i.e. queries where we know that all (live) record read will be returned (maybe partly) in the result set. The reasoning is that those “non filtering” queries have predictable performance in the sense that they will execute in a time that is proportional to the amount of data returned by the query (which can be controlled through LIMIT).

The ALLOW FILTERING option allows to explicitly allow (some) queries that require filtering. Please note that a query using ALLOW FILTERING may thus have unpredictable performance (for the definition above), i.e. even a query that selects a handful of records may exhibit performance that depends on the total amount of data stored in the cluster.

For instance, considering the following table holding user profiles with their year of birth (with a secondary index on it) and country of residence:

CREATE TABLE users (
+

The CONTAINS operator may only be used on collection columns (lists, sets, and maps). In the case of maps, CONTAINS applies to the map values. The CONTAINS KEY operator may only be used on map columns and applies to the map keys.

<order-by>

The ORDER BY option allows to select the order of the returned results. It takes as argument a list of column names along with the order for the column (ASC for ascendant and DESC for descendant, omitting the order being equivalent to ASC). Currently the possible orderings are limited (which depends on the table CLUSTERING ORDER ):

  • if the table has been defined without any specific CLUSTERING ORDER, then then allowed orderings are the order induced by the clustering columns and the reverse of that one.
  • otherwise, th e orderings allowed are the order of the CLUSTERING ORDER option and the reversed one.

LIMIT

The LIMIT option to a SELECT statement limits the number of rows returned by a query.

ALLOW FILTERING

By default, CQL only allows select queries that don’t involve “filtering” server side, i.e. queries where we know that all (live) record read will be returned (maybe partly) in the result set. The reasoning is that those “non filtering” queries have predictable performance in the sense that they will execute in a time that is proportional to the amount of data returned by the query (which can be controlled through LIMIT).

The ALLOW FILTERING option allows to explicitly allow (some) queries that require filtering. Please note that a query using ALLOW FILTERING ma y thus have unpredictable performance (for the definition above), i.e. even a query that selects a handful of records may exhibit performance that depends on the total amount of data stored in the cluster.

For instance, considering the following table holding user profiles with their year of birth (with a secondary index on it) and country of residence:

CREATE TABLE users (
     username text PRIMARY KEY,
     firstname text,
     lastname text,
@@ -572,8 +572,14 @@ UPDATE plays SET scores = scores - [ 12,
     ...
 )
 

then the token function will take a single argument of type text (in that case, the partition key is userid (there is no clustering columns so the partition key is the same than the primary key)), and the return type will be bigint.

Uuid

The uuid function takes no parameters and generates a random type 4 uuid suitable for use in INSERT or SET statements.

Timeuuid functions

now

The now function takes no arguments and generates a new unique timeuuid (at the time where the statement using it is executed). Note that this method is useful for insertion but is largely non-sensical in WHERE clauses. For instance, a query of the form

SELECT * FROM myTable WHERE t = now()
-

will never return any result by design, since the value returned by now() is guaranteed to be unique.

minTimeuuid and maxTimeuuid

The minTimeuuid (resp. maxTimeuuid) function takes a timestamp value t (which can be either a timestamp or a date string) and return a fake timeuuid corresponding to the smallest (resp. biggest) possible timeuuid having for timestamp t. So for instance:

SELECT * FROM myTable WHERE t > maxTimeuuid('2013-01-01 00:05+0000') AND t < minTimeuuid('2013-02-02 10:00+0000')
-

will select all rows where the timeuuid column t is strictly older than ‘2013-01-01 00:05+0000’ but strictly younger than ‘2013-02-02 10:00+0000’. Please note that t >= maxTimeuuid('2013-01-01 00:05+0000') would still not select a timeuuid generated exactly at ‘2013-01-01 00:05+0000’ and is essentially equivalent to t > maxTimeuuid('2013-01-01 00:05+0000').

Warning: We called the values generated by minTimeuuid and maxTimeuuid fake UUID because they do no respect the Time-Based UUID generation process specified by the RFC 4122. In particular, the value returned by these 2 methods will not be unique. This means you should only use those methods for querying (as in the example above). Inserting the result of those methods is almost certainly a bad idea.

Time conversion functions

A number of functions are provided to “convert” a timeuuid, a timestamp or a date into another native type.

function name input type description
toDate timeuuid Converts the timeuuid argument into a date type
toDate timestamp Converts the timestamp argument into a date type
toTimestamp timeuuid Converts the timeuuid argument into a timestamp type
toTimestamp date Converts the date argument into a timestamp type
< code>toUnixTimestamp timeuuid Converts the timeuuid argument into a bigInt raw value
toUnixTimestamp timestamp Converts the timestamp argument into a bigInt raw value
toUnixTimestamp date Converts the date argument into a bigInt raw value
dateOf timeuuid Similar to toTimestamp(timeuuid) (DEPRECATED)
unixTimestampOf timeuuid Similar to toUnixTimestamp(timeuuid) (DEPRECATED)

Blob conversion functions

A number of functions are provided to “convert” the native types into binary data (blob). For ev ery <native-type> type supported by CQL3 (a notable exceptions is blob, for obvious reasons), the function typeAsBlob takes a argument of type type and return it as a blob. Conversely, the function blobAsType takes a 64-bit blob argument and convert it to a bigint value. And so for instance, bigintAsBlob(3) is 0x0000000000000003 and blobAsBigint(0x0000000000000003) is 3.

User-Defined Functions

User-defined functions allow execution of user-provided code in Cassandra. By default, Cassandra supports defining functions in Java and JavaScript. Support for other JSR 223 compliant scripting languages (such as Python, Ruby, and Scala) can be added by adding a JAR to the classpath.

UDFs are part of the Cassandra schema. As such, they are automatically propagated to all no des in the cluster.

UDFs can be overloaded - i.e. multiple UDFs with different argument types but the same function name. Example:

CREATE FUNCTION sample ( arg int ) ...;
+

will never return any result by design, since the value returned by now() is guaranteed to be unique.

minTimeuuid and maxTimeuuid

The minTimeuuid (resp. maxTimeuuid) function takes a timestamp value t (which can be either a timestamp or a date string ) and return a fake timeuuid corresponding to the smallest (resp. biggest) possible timeuuid having for timestamp t. So for instance:

SELECT * FROM myTable WHERE t > maxTimeuuid('2013-01-01 00:05+0000') AND t < minTimeuuid('2013-02-02 10:00+0000')
+

will select all rows where the timeuuid column t is strictly older than ‘2013-01-01 00:05+0000’ but strictly younger than ‘2013-02-02 10:00+0000’. Please note that t >= maxTimeuuid('2013-01-01 00:05+0000') would still not select a timeuuid generated exactly at ‘2013-01-01 00:05+0000’ and is essentially equivalent to t > maxTimeuuid('2013-01-01 00:05+0000').

Warning: We called the values generated by minTimeuuid and maxTimeuuid fake UUID because they do no respect the Time-Based UUID generation process specified by the RFC 4122. In particular, the value returned by these 2 methods will not be unique. This means you should only use those methods for querying (as in the example above). Inserting the result of those methods is almost certainly a bad idea.< /p>

Time conversion functions

A number of functions are provided to “convert” a timeuuid, a timestamp or a date into another native type.

function name input type description
toDate timeuuid Converts the timeuuid argument into a date type
toDate timestamp Converts the timestamp argument into a date type
toTimestamp timeuuid Converts the timeuuid argument into a timestamp type
toTimestamp date Converts the date argument into a timestamp type
toUnixTimestamp timeuuid Converts the timeuuid argument into a bigInt raw value
toUnixTimestamp timestamp Converts the timestamp argument into a bigInt raw value
toUnixTimestamp date Converts the date argument into a bigInt raw value
dateOf timeuuid Similar to toTimestamp(timeuuid) (DEPRECATED)
unixTimestampOf timeuuid Similar to toUnixTimestamp(timeuuid) (DEPRECATED)

Blob conversion functions

A number of functions are provided to “convert” the native types into binary data (blob). For eve ry <native-type> type supported by CQL3 (a notable exceptions is blob, for obvious reasons), the function typeAsBlob takes a argument of type type and return it as a blob. Conversely, the function blobAsType takes a 64-bit blob argument and convert it to a bigint value. And so for instance, bigintAsBlob(3) is 0x0000000000000003 and blobAsBigint(0x0000000000000003) is 3.

Aggregates

CQL3 distinguishes between built-in aggregates (so called ‘native aggregates’) and user-defined aggregates. CQL3 includes several native aggregates, described below:

Count

The count function can be used to count the rows returned by a query. Example:

SELECT COUNT(*) FROM plays;
+SELECT COUNT(1) FROM plays;
+

It also can be used to count the non null value of a given column. Example:

SELECT COUNT(scores) FROM plays;
+

Max and Min

The max and min functions can be used to compute the maximum and the minimum value returned by a query for a given column.

SELECT MIN(players), MAX(players) FROM plays WHERE game = 'quake';
+

Sum

The sum function can be used to sum up all the values returned by a query for a given column.

SELECT SUM(players) FROM plays;
+

Avg

The avg function can be used to compute the average of all the values returned by a query for a given column.

SELECT AVG(players) FROM plays;
+

User-Defined Functions

User-defined functions allow execution of user-provided code in Cassandra. By default, Cassandra supports defining functions in Java and JavaScript. Support for other JSR 223 compliant scripting languages (such as Python, Ruby, and Scala) can be added by adding a JAR to the classpath.

UDFs are part of the Cassandra schema. As such, they are automatically propagated to all nodes in the cluster.

UDFs can be overloaded - i.e. multiple UDFs with different argument types but the same function name. Example:

CREATE FUNCTION sample ( arg int ) ...;
 CREATE FUNCTION sample ( arg text ) ...;
 

User-defined functions are susceptible to all of the normal problems with the chosen programming language. Accordingly, implementations should be safe against null pointer exceptions, illegal arguments, or any other potential source of exceptions. An exception during function execution will result in the entire statement failing.

It is valid to use complex types like collections, tuple types and user-defined types as argument and return types. Tuple types and user-defined types are handled by the conversion functions of the DataStax Java Driver. Please see the documentation of the Java Driver for details on handling tuple types and user-defined types.

Arguments for functions can be literals or terms. Prepared statement placeholders can be used, too.

Note that you can use the double-quoted string syntax to enclose the UDF source code. For example:

CREATE FUNCTION some_function ( arg int )
   RETURNS NULL ON NULL INPUT
@@ -628,4 +634,4 @@ INSERT INTO atable (pk, val) VALUES (3,3
 INSERT INTO atable (pk, val) VALUES (4,4);
 SELECT average(val) FROM atable;
 

See CREATE AGGREGATE and DROP AGGREGATE.

JSON Support

Cassandra 2.2 introduces JSON support to SELECT and INSERT statements. This support does not fundamentally alter the CQL API (for example, the schema is still enforced), it simply provides a convenient way to work with JSON documents.

SELECT JSON

With SELECT statements, the new JSON keyword can be used to return each row as a single JSON encoded map. The remainder of the SELECT statment behavior is the same.

The result map keys are the same as the column names in a normal result set. For example, a statement like "SELECT JSON a, ttl(b) FROM ..." would result in a map with keys "a" and "ttl(b)". However, this is one notable exception: for symmetry with INSERT JSON behavior, case-sensitive column names with upper-case letters will be surrounded with double quotes. For example, "SELECT JSON myColumn FROM ..." would result in a map key "\"myColumn\"" (note the escaped quotes).

The map values will JSON-encoded representations (as described below) of the result set values.

INSERT JSON

With INSERT statements, the new JSON keyword can be used to enable inserting a JSON encoded map as a single row. The format of the JSON map should generally match that returned by a SELECT JSON statement on the same table. In particular, case-sensitive column names should be surrounded with double quotes. For example, to insert into a table with two columns named “myKey” and “value”, you would do the following:

INSERT INTO mytable JSON '{"\"myKey\"": 0, "value": 0}'
-

Any columns which are ommitted from the JSON map will be defaulted to a NULL value (which will result in a tombstone being created).

JSON Encoding of Cassandra Data Types

Where possible, Cassandra will represent and accept data types in their native JSON representation. Cassandra will also accept string representations matching the CQL literal format for all data types. The following table describes the encodings that Cassandra will accept in INSERT JSON values (and fromJson() arguments) as well as the format Cassandra will use when returning data for SELECT JSON statements (and fromJson()):

< tr>< td>string
type formats accepted return format notes
ascii string string Uses JSON’s \u character escape
bigint integer, string integer String must be valid 64 bit integer
blob string string String should be 0x followed by an even number of hex digits
boolean boolean, string boolean String must be “true” or "false"
date string string Date in format YYYY-MM-DD, timezone UTC
decimal integer, float, stringfloat May exceed 32 or 64-bit IEEE-754 floating point precision in client-side decoder
double integer, float, stringfloat String must be valid integer or float
float integer, float, stringfloat String must be valid integer or float
inet string string IPv4 or IPv6 address
int integer, string integer String must be valid 32 bit integer
list list, string list Uses JSON’s native list representation
map map, string map Uses JSON’s native map representation
set list, string list Uses JSON’s native list representation
text string string Uses JSON’s \u character escape
time string Time of day in format HH-MM-SS[.fffffffff]
timestampinteger, string string A timestamp. Strings constant are allow to input timestamps as dates, see Working with dates below for more information. Datestamps with format YYYY-MM-DD HH:MM:SS.SSS are returned.
timeuuid string string Type 1 UUID. See Constants for the UUID format
tuple list, string list Uses JSON’s native list representation
UDT map, string map Uses JSON’s native map representation with field names as keys
uuid strin g string See Constants for the UUID format
varchar string string Uses JSON’s \u character escape
varint integer, string integer Variable length; may overflow 32 or 64 bit integers in client-side decoder

The fromJson() Function

The fromJson() function may be used similarly to INSERT JSON, but for a single column value. It may only be used in the VALUES clause of an INSERT statement or as one of the column values in an UPDATE, DELETE, or SELECT statement. For example, it cannot be used in the selection clause of a SELECT statement.

The toJson() Function

The toJson() function may be used similarly to SELECT JSON, but for a single column value. It may only be used in the selection clause of a SELECT statement.

Appendix A: CQL Keywords

CQL distinguishes between reserved and non-reserved keywords. Reserved keywords cannot be used as identifier, they are truly reserved for the language (but one can enclose a reserved keyword by double-quotes to use it as an identifier). Non-reserved keywords however only have a specific meaning in certain context but can used as identifer otherwise. The only raison d'être of these non-reserved keywords is convenience: some keyword are non-reserved when it was always easy for the parser to decide whether they were used as keywords or not.

Keyword Reserved?
ADD yes
AGGREGATE no
ALL< /code> no
ALLOW yes
ALTER yes
AND yes
APPLY yes
AS no
ASC yes
ASCII no
AUTHORIZE yes
BATCH yes
BEGIN yes
BIGINT no
BLOB no
BOOLEAN no
BY yes
CALLED no
CLUSTERING no
COLUMNFAMILY yes
COMPACT no
CONTAINS no
COUNT no
COUNTER no
CREATE yes
CUSTOM no
DATE no
DECIMAL no
DELETE yes
DESC yes
DESCRIBE yes
DISTINCT no
DOUBLE no
DROP yes
ENTRIES yes
EXECUTE yes
EXISTS no
FIL TERING no
FINALFUNC no
FLOAT no
FROM yes
FROZEN no
FULL yes
FUNCTION no
FUNCTIONS no
GRANT yes
IF yes
IN yes
INDEX yes
INET no
INFINITY yes
INITCOND no
INPUT no
INSERT yes
INT no
INTO yes
JSON no
KEY no
KEYS no
KEYSPACE yes
KEYSPACES no
LANGUAGE no
LIMIT yes
LIST no
LOGIN no
MAP no
MODIFY yes
NAN yes
NOLOGIN no
NORECURSIVE yes
NOSUPERUSER no
NOT yes
NU LL yes
OF yes
ON yes
OPTIONS no
OR yes
ORDER yes
PASSWORD no
PERMISSION no
PERMISSIONS no
PRIMARY yes
RENAME yes
REPLACE yes
RETURNS no
REVOKE yes
ROLE no
ROLES no
SCHEMA yes
SELECT y es
SET yes
SFUNC no
SMALLINT no
STATIC no
STORAGE no
STYPE no
SUPERUSER no
TABLE yes
TEXT no
TIME no
TIMESTAMP no
TIMEUUID no
TINYINT no
TO yes
TOKEN yes
TRIGGER no
TRUNCATE yes
T TL no
TUPLE no
TYPE no
UNLOGGED yes
UPDATE yes
USE yes
USER no
USERS no
USING yes
UUID no
VALUES no
VARCHAR no
VARINT no
WHERE yes
WITH yes
WRITETIME no

Appendix B: CQL Reserved Types

The following type names are not cu rrently used by CQL, but are reserved for potential future use. User-defined types may not use reserved type names as their name.

type
bitstring
byte
complex
date
enum
interval
macaddr
smallint

Changes

The following describes the changes in each version of CQL.

3.3.0

  • User-defined functions are now supported through CREATE FUNCTION and DROP FUNCTION,
  • User-defined aggregates are now supported through CREATE AGGREGATE and DROP AGGREGATE.
  • Allows double-dollar enclosed strings literals as an alternative to single-quote enclosed strings.
  • Introduces Roles to supercede user based authentication and access control

3.2.0

  • User-defined types are now supported through CREATE TYPE, ALTER TYPE, and DROP TYPE
  • CREATE INDEX now supports indexing collection columns, including indexing the keys of map collections through the keys() function
  • Indexes on collections may be queried using the new CONTAINS and CONTAINS KEY operators
  • Tuple types were added to hold fixed-length sets of typed positional fields (see the section on types)
  • DROP INDEX now supports optionally specifying a keys pace

3.1.7

  • SELECT statements now support selecting multiple rows in a single partition using an IN clause on combinations of clustering columns. See SELECT WHERE clauses.
  • IF NOT EXISTS and IF EXISTS syntax is now supported by CREATE USER and DROP USER statmenets, respectively.

3.1.6

  • A new uuid method has been added.
  • Support for DELETE ... IF EXISTS syntax.

3.1.5

3.1.4

3.1.3

  • Millisecond precision formats have been added to the timestamp parser (see working with dates).

3.1.2

  • NaN and Infinity has been added as valid float contants. They are now reserved keywords. In the unlikely case you we using them as a column identifier (or keyspace/table one), you will noew need to double quote them (see quote identifiers).

3.1.1

  • SELECT statement now allows listing the partition keys (using the DISTINCT modifier). See CASSANDRA-4536.
  • The syntax c IN ? is now supported in WHERE clauses. In that case, the value expected for the bind variable will be a list of whatever type c is.
  • It is now possible to use named bind variables (using :name instead of ?).

3.1.0

  • ALTER TABLE DROP option has been reenabled for CQL3 tables and has new semantics now: the space formerly used by dropped columns will now be eventually reclaimed (post-compaction). You should not readd previously dropped columns unless you use timestamps with microsecond precision (see CASSANDRA-3919 for more details).
  • SELECT statement now supports aliases in select clause. Aliases in WHERE and ORDER BY clauses are not supported. See the section on select for details.
  • CREATE statements for KEYSPACE, TABLE and INDEX now supports an IF NOT EXISTS condition. Similarly, DROP statemen ts support a IF EXISTS condition.
  • INSERT statements optionally supports a IF NOT EXISTS condition and UPDATE supports IF conditions.

3.0.5

  • SELECT, UPDATE, and DELETE statements now allow empty IN relations (see CASSANDRA-5626).

3.0.4

  • Updated the syntax for custom secondary indexes.
  • Non-equal condition on the partition key are now never supported, even for ordering partitioner as this was not correct (the order was not the one of the type of the partition key). Instead, the token method should always be used for range queries on the partition key (see WHERE clauses).

3.0.3

3.0.2

  • Type validation for the constants has been fixed. For instance, the implementation used to allow '2' as a valid value for an int column (interpreting it has the equivalent of 2), or 42 as a valid blob value (in which case 42 was interpreted as an hexadecimal representation of the blob). This is no longer the case, type validation of constants is now more strict. See the data types section for details on which constant is allowed for which type.
  • The type validation fixed of the previous point has lead to the introduction of blobs constants to allow inputing blobs. Do note that while inputing blobs as strings constant is still supported by this version (to allow smoother transition to blob constant), it is now deprecated (in particular the data types section does not list strings constants as valid blobs) and will be removed by a future version. If you were using strings as blobs, you should thus update your client code ASAP to switch blob constants.
  • A number of functions to convert native types to blobs have also been introduced. Furthermore the token function is now also allowed in select clauses. See the section on functions for details.

3.0.1

  • Date strings (and timestamps) are no longer accepted as valid timeuuid values. Doing so was a bug in the sense that date string are not valid timeuuid, and it was thus resulting in confusing behaviors. However, the following new methods have been added to help working with timeuuid: now, minTimeuuid, max Timeuuid , dateOf and unixTimestampOf. See the section dedicated to these methods for more detail.
  • “Float constants”#constants now support the exponent notation. In other words, 4.2E10 is now a valid floating point value.

Versioning

Versioning of the CQL language adheres to the Semantic Versioning guidelines. Versions take the form X.Y.Z where X, Y, and Z are integer values representing major, minor, and patch level respectively. There is no correlation between Cassandra release versions and the CQL language version.

versiondescription
Major The major version must be bumped when backward incompatible changes are introduced. This should rarely occur.
Minor Minor version increments occur when new, but backward compatible, function ality is introduced.
Patch The patch version is incremented when bugs are fixed.
\ No newline at end of file [... 3 lines stripped ...]