Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 8B4F9200B54 for ; Thu, 28 Jul 2016 21:27:11 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 898F3160A85; Thu, 28 Jul 2016 19:27:11 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 624B8160A56 for ; Thu, 28 Jul 2016 21:27:09 +0200 (CEST) Received: (qmail 49983 invoked by uid 500); 28 Jul 2016 19:27:08 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 49972 invoked by uid 99); 28 Jul 2016 19:27:08 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Jul 2016 19:27:08 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 5939FE0100; Thu, 28 Jul 2016 19:27:08 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: tylerhobbs@apache.org To: commits@cassandra.apache.org Message-Id: <7226593da2ec4a99b8845442361b8d7a@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: cassandra git commit: Add beta protocol flag for v5 native protocol Date: Thu, 28 Jul 2016 19:27:08 +0000 (UTC) archived-at: Thu, 28 Jul 2016 19:27:11 -0000 Repository: cassandra Updated Branches: refs/heads/trunk cfb050592 -> d9322a171 Add beta protocol flag for v5 native protocol Patch by Alex Petrov; reviewed by Tyler Hobbs for CASSANDRA-12142 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/d9322a17 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/d9322a17 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/d9322a17 Branch: refs/heads/trunk Commit: d9322a17189567d2c117cb3e69c456dd0781931e Parents: cfb0505 Author: Alex Petrov Authored: Thu Jul 28 14:17:08 2016 -0500 Committer: Tyler Hobbs Committed: Thu Jul 28 14:17:51 2016 -0500 ---------------------------------------------------------------------- CHANGES.txt | 1 + doc/native_protocol_v5.spec | 1162 ++++++++++++++++++ .../org/apache/cassandra/cql3/ResultSet.java | 2 +- .../org/apache/cassandra/transport/CBUtil.java | 2 +- .../org/apache/cassandra/transport/Event.java | 12 +- .../org/apache/cassandra/transport/Frame.java | 34 +- .../org/apache/cassandra/transport/Message.java | 2 + .../cassandra/transport/ProtocolException.java | 2 + .../org/apache/cassandra/transport/Server.java | 2 + .../cassandra/transport/SimpleClient.java | 16 +- .../transport/messages/BatchMessage.java | 4 +- .../service/ProtocolBetaVersionTest.java | 85 ++ .../cassandra/transport/ProtocolErrorTest.java | 6 +- 13 files changed, 1302 insertions(+), 28 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/cassandra/blob/d9322a17/CHANGES.txt ---------------------------------------------------------------------- diff --git a/CHANGES.txt b/CHANGES.txt index e86e7d9..bdf0db3 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 3.10 + * Add beta protocol flag for v5 native protocol (CASSANDRA-12142) * Support filtering on non-PRIMARY KEY columns in the CREATE MATERIALIZED VIEW statement's WHERE clause (CASSANDRA-10368) * Unify STDOUT and SYSTEMLOG logback format (CASSANDRA-12004) http://git-wip-us.apache.org/repos/asf/cassandra/blob/d9322a17/doc/native_protocol_v5.spec ---------------------------------------------------------------------- diff --git a/doc/native_protocol_v5.spec b/doc/native_protocol_v5.spec new file mode 100644 index 0000000..edf3093 --- /dev/null +++ b/doc/native_protocol_v5.spec @@ -0,0 +1,1162 @@ + + CQL BINARY PROTOCOL v5 + + +Table of Contents + + 1. Overview + 2. Frame header + 2.1. version + 2.2. flags + 2.3. stream + 2.4. opcode + 2.5. length + 3. Notations + 4. Messages + 4.1. Requests + 4.1.1. STARTUP + 4.1.2. AUTH_RESPONSE + 4.1.3. OPTIONS + 4.1.4. QUERY + 4.1.5. PREPARE + 4.1.6. EXECUTE + 4.1.7. BATCH + 4.1.8. REGISTER + 4.2. Responses + 4.2.1. ERROR + 4.2.2. READY + 4.2.3. AUTHENTICATE + 4.2.4. SUPPORTED + 4.2.5. RESULT + 4.2.5.1. Void + 4.2.5.2. Rows + 4.2.5.3. Set_keyspace + 4.2.5.4. Prepared + 4.2.5.5. Schema_change + 4.2.6. EVENT + 4.2.7. AUTH_CHALLENGE + 4.2.8. AUTH_SUCCESS + 5. Compression + 6. Data Type Serialization Formats + 7. User Defined Type Serialization + 8. Result paging + 9. Error codes + 10. Changes from v4 + + +1. Overview + + The CQL binary protocol is a frame based protocol. Frames are defined as: + + 0 8 16 24 32 40 + +---------+---------+---------+---------+---------+ + | version | flags | stream | opcode | + +---------+---------+---------+---------+---------+ + | length | + +---------+---------+---------+---------+ + | | + . ... body ... . + . . + . . + +---------------------------------------- + + The protocol is big-endian (network byte order). + + Each frame contains a fixed size header (9 bytes) followed by a variable size + body. The header is described in Section 2. The content of the body depends + on the header opcode value (the body can in particular be empty for some + opcode values). The list of allowed opcodes is defined in Section 2.4 and the + details of each corresponding message are described Section 4. + + The protocol distinguishes two types of frames: requests and responses. Requests + are those frames sent by the client to the server. Responses are those frames sent + by the server to the client. Note, however, that the protocol supports server pushes + (events) so a response does not necessarily come right after a client request. + + Note to client implementors: client libraries should always assume that the + body of a given frame may contain more data than what is described in this + document. It will however always be safe to ignore the remainder of the frame + body in such cases. The reason is that this may enable extending the protocol + with optional features without needing to change the protocol version. + + + +2. Frame header + +2.1. version + + The version is a single byte that indicates both the direction of the message + (request or response) and the version of the protocol in use. The most + significant bit of version is used to define the direction of the message: + 0 indicates a request, 1 indicates a response. This can be useful for protocol + analyzers to distinguish the nature of the packet from the direction in which + it is moving. The rest of that byte is the protocol version (5 for the protocol + defined in this document). In other words, for this version of the protocol, + version will be one of: + 0x04 Request frame for this protocol version + 0x84 Response frame for this protocol version + + Please note that while every message ships with the version, only one version + of messages is accepted on a given connection. In other words, the first message + exchanged (STARTUP) sets the version for the connection for the lifetime of this + connection. The single exception to this behavior is when a startup message + is sent with a version that is higher than the current server version. In this + case, the server will respond with its current version. + + This document describes version 5 of the protocol. For the changes made since + version 4, see Section 10. + + +2.2. flags + + Flags applying to this frame. The flags have the following meaning (described + by the mask that allows selecting them): + 0x01: Compression flag. If set, the frame body is compressed. The actual + compression to use should have been set up beforehand through the + Startup message (which thus cannot be compressed; Section 4.1.1). + 0x02: Tracing flag. For a request frame, this indicates the client requires + tracing of the request. Note that only QUERY, PREPARE and EXECUTE queries + support tracing. Other requests will simply ignore the tracing flag if + set. If a request supports tracing and the tracing flag is set, the response + to this request will have the tracing flag set and contain tracing + information. + If a response frame has the tracing flag set, its body contains + a tracing ID. The tracing ID is a [uuid] and is the first thing in + the frame body. The rest of the body will then be the usual body + corresponding to the response opcode. + 0x04: Custom payload flag. For a request or response frame, this indicates + that a generic key-value custom payload for a custom QueryHandler + implementation is present in the frame. Such a custom payload is simply + ignored by the default QueryHandler implementation. + Currently, only QUERY, PREPARE, EXECUTE and BATCH requests support + payload. + Type of custom payload is [bytes map] (see below). + 0x08: Warning flag. The response contains warnings which were generated by the + server to go along with this response. + If a response frame has the warning flag set, its body will contain the + text of the warnings. The warnings are a [string list] and will be the + first value in the frame body if the tracing flag is not set, or directly + after the tracing ID if it is. + 0x10: Use beta flag. Indicates that the client opts in to use protocol version + that is currently in beta. Server will respond with ERROR if protocol + version is marked as beta on server and client does not provide this flag. + + The rest of flags is currently unused and ignored. + +2.3. stream + + A frame has a stream id (a [short] value). When sending request messages, this + stream id must be set by the client to a non-negative value (negative stream id + are reserved for streams initiated by the server; currently all EVENT messages + (section 4.2.6) have a streamId of -1). If a client sends a request message + with the stream id X, it is guaranteed that the stream id of the response to + that message will be X. + + This helps to enable the asynchronous nature of the protocol. If a client + sends multiple messages simultaneously (without waiting for responses), there + is no guarantee on the order of the responses. For instance, if the client + writes REQ_1, REQ_2, REQ_3 on the wire (in that order), the server might + respond to REQ_3 (or REQ_2) first. Assigning different stream ids to these 3 + requests allows the client to distinguish to which request a received answer + responds to. As there can only be 32768 different simultaneous streams, it is up + to the client to reuse stream id. + + Note that clients are free to use the protocol synchronously (i.e. wait for + the response to REQ_N before sending REQ_N+1). In that case, the stream id + can be safely set to 0. Clients should also feel free to use only a subset of + the 32768 maximum possible stream ids if it is simpler for its implementation. + +2.4. opcode + + An integer byte that distinguishes the actual message: + 0x00 ERROR + 0x01 STARTUP + 0x02 READY + 0x03 AUTHENTICATE + 0x05 OPTIONS + 0x06 SUPPORTED + 0x07 QUERY + 0x08 RESULT + 0x09 PREPARE + 0x0A EXECUTE + 0x0B REGISTER + 0x0C EVENT + 0x0D BATCH + 0x0E AUTH_CHALLENGE + 0x0F AUTH_RESPONSE + 0x10 AUTH_SUCCESS + + Messages are described in Section 4. + + (Note that there is no 0x04 message in this version of the protocol) + + +2.5. length + + A 4 byte integer representing the length of the body of the frame (note: + currently a frame is limited to 256MB in length). + + +3. Notations + + To describe the layout of the frame body for the messages in Section 4, we + define the following: + + [int] A 4 bytes integer + [long] A 8 bytes integer + [short] A 2 bytes unsigned integer + [string] A [short] n, followed by n bytes representing an UTF-8 + string. + [long string] An [int] n, followed by n bytes representing an UTF-8 string. + [uuid] A 16 bytes long uuid. + [string list] A [short] n, followed by n [string]. + [bytes] A [int] n, followed by n bytes if n >= 0. If n < 0, + no byte should follow and the value represented is `null`. + [value] A [int] n, followed by n bytes if n >= 0. + If n == -1 no byte should follow and the value represented is `null`. + If n == -2 no byte should follow and the value represented is + `not set` not resulting in any change to the existing value. + n < -2 is an invalid value and results in an error. + [short bytes] A [short] n, followed by n bytes if n >= 0. + + [option] A pair of where is a [short] representing + the option id and depends on that option (and can be + of size 0). The supported id (and the corresponding ) + will be described when this is used. + [option list] A [short] n, followed by n [option]. + [inet] An address (ip and port) to a node. It consists of one + [byte] n, that represents the address size, followed by n + [byte] representing the IP address (in practice n can only be + either 4 (IPv4) or 16 (IPv6)), following by one [int] + representing the port. + [consistency] A consistency level specification. This is a [short] + representing a consistency level with the following + correspondance: + 0x0000 ANY + 0x0001 ONE + 0x0002 TWO + 0x0003 THREE + 0x0004 QUORUM + 0x0005 ALL + 0x0006 LOCAL_QUORUM + 0x0007 EACH_QUORUM + 0x0008 SERIAL + 0x0009 LOCAL_SERIAL + 0x000A LOCAL_ONE + + [string map] A [short] n, followed by n pair where and + are [string]. + [string multimap] A [short] n, followed by n pair where is a + [string] and is a [string list]. + [bytes map] A [short] n, followed by n pair where is a + [string] and is a [bytes]. + + +4. Messages + +4.1. Requests + + Note that outside of their normal responses (described below), all requests + can get an ERROR message (Section 4.2.1) as response. + +4.1.1. STARTUP + + Initialize the connection. The server will respond by either a READY message + (in which case the connection is ready for queries) or an AUTHENTICATE message + (in which case credentials will need to be provided using AUTH_RESPONSE). + + This must be the first message of the connection, except for OPTIONS that can + be sent before to find out the options supported by the server. Once the + connection has been initialized, a client should not send any more STARTUP + messages. + + The body is a [string map] of options. Possible options are: + - "CQL_VERSION": the version of CQL to use. This option is mandatory and + currently the only version supported is "3.0.0". Note that this is + different from the protocol version. + - "COMPRESSION": the compression algorithm to use for frames (See section 5). + This is optional; if not specified no compression will be used. + + +4.1.2. AUTH_RESPONSE + + Answers a server authentication challenge. + + Authentication in the protocol is SASL based. The server sends authentication + challenges (a bytes token) to which the client answers with this message. Those + exchanges continue until the server accepts the authentication by sending a + AUTH_SUCCESS message after a client AUTH_RESPONSE. Note that the exchange + begins with the client sending an initial AUTH_RESPONSE in response to a + server AUTHENTICATE request. + + The body of this message is a single [bytes] token. The details of what this + token contains (and when it can be null/empty, if ever) depends on the actual + authenticator used. + + The response to a AUTH_RESPONSE is either a follow-up AUTH_CHALLENGE message, + an AUTH_SUCCESS message or an ERROR message. + + +4.1.3. OPTIONS + + Asks the server to return which STARTUP options are supported. The body of an + OPTIONS message should be empty and the server will respond with a SUPPORTED + message. + + +4.1.4. QUERY + + Performs a CQL query. The body of the message must be: + + where is a [long string] representing the query and + must be + [[name_1]...[name_n]][][][][] + where: + - is the [consistency] level for the operation. + - is a [byte] whose bits define the options for this query and + in particular influence what the remainder of the message contains. + A flag is set if the bit corresponding to its `mask` is set. Supported + flags are, given their mask: + 0x01: Values. If set, a [short] followed by [value] + values are provided. Those values are used for bound variables in + the query. Optionally, if the 0x40 flag is present, each value + will be preceded by a [string] name, representing the name of + the marker the value must be bound to. + 0x02: Skip_metadata. If set, the Result Set returned as a response + to the query (if any) will have the NO_METADATA flag (see + Section 4.2.5.2). + 0x04: Page_size. If set, is an [int] + controlling the desired page size of the result (in CQL3 rows). + See the section on paging (Section 8) for more details. + 0x08: With_paging_state. If set, should be present. + is a [bytes] value that should have been returned + in a result set (Section 4.2.5.2). The query will be + executed but starting from a given paging state. This is also to + continue paging on a different node than the one where it + started (See Section 8 for more details). + 0x10: With serial consistency. If set, should be + present. is the [consistency] level for the + serial phase of conditional updates. That consitency can only be + either SERIAL or LOCAL_SERIAL and if not present, it defaults to + SERIAL. This option will be ignored for anything else other than a + conditional update/insert. + 0x20: With default timestamp. If set, should be present. + is a [long] representing the default timestamp for the query + in microseconds (negative values are forbidden). This will + replace the server side assigned timestamp as default timestamp. + Note that a timestamp in the query itself will still override + this timestamp. This is entirely optional. + 0x40: With names for values. This only makes sense if the 0x01 flag is set and + is ignored otherwise. If present, the values from the 0x01 flag will + be preceded by a name (see above). Note that this is only useful for + QUERY requests where named bind markers are used; for EXECUTE statements, + since the names for the expected values was returned during preparation, + a client can always provide values in the right order without any names + and using this flag, while supported, is almost surely inefficient. + + Note that the consistency is ignored by some queries (USE, CREATE, ALTER, + TRUNCATE, ...). + + The server will respond to a QUERY message with a RESULT message, the content + of which depends on the query. + + +4.1.5. PREPARE + + Prepare a query for later execution (through EXECUTE). The body consists of + the CQL query to prepare as a [long string]. + + The server will respond with a RESULT message with a `prepared` kind (0x0004, + see Section 4.2.5). + + +4.1.6. EXECUTE + + Executes a prepared query. The body of the message must be: + + where is the prepared query ID. It's the [short bytes] returned as a + response to a PREPARE message. As for , it has the exact + same definition as in QUERY (see Section 4.1.4). + + The response from the server will be a RESULT message. + + +4.1.7. BATCH + + Allows executing a list of queries (prepared or not) as a batch (note that + only DML statements are accepted in a batch). The body of the message must + be: + ...[][] + where: + - is a [byte] indicating the type of batch to use: + - If == 0, the batch will be "logged". This is equivalent to a + normal CQL3 batch statement. + - If == 1, the batch will be "unlogged". + - If == 2, the batch will be a "counter" batch (and non-counter + statements will be rejected). + - is a [byte] whose bits define the options for this query and + in particular influence what the remainder of the message contains. It is similar + to the from QUERY and EXECUTE methods, except that the 4 rightmost + bits must always be 0 as their corresponding options do not make sense for + Batch. A flag is set if the bit corresponding to its `mask` is set. Supported + flags are, given their mask: + 0x10: With serial consistency. If set, should be + present. is the [consistency] level for the + serial phase of conditional updates. That consistency can only be + either SERIAL or LOCAL_SERIAL and if not present, it defaults to + SERIAL. This option will be ignored for anything else other than a + conditional update/insert. + 0x20: With default timestamp. If set, should be present. + is a [long] representing the default timestamp for the query + in microseconds. This will replace the server side assigned + timestamp as default timestamp. Note that a timestamp in the query itself + will still override this timestamp. This is entirely optional. + 0x40: With names for values. If set, then all values for all must be + preceded by a [string] that have the same meaning as in QUERY + requests [IMPORTANT NOTE: this feature does not work and should not be + used. It is specified in a way that makes it impossible for the server + to implement. This will be fixed in a future version of the native + protocol. See https://issues.apache.org/jira/browse/CASSANDRA-10246 for + more details]. + - is a [short] indicating the number of following queries. + - ... are the queries to execute. A must be of the + form: + []...[] + where: + - is a [byte] indicating whether the following query is a prepared + one or not. value must be either 0 or 1. + - depends on the value of . If == 0, it should be + a [long string] query string (as in QUERY, the query string might contain + bind markers). Otherwise (that is, if == 1), it should be a + [short bytes] representing a prepared query ID. + - is a [short] indicating the number (possibly 0) of following values. + - is the optional name of the following . It must be present + if and only if the 0x40 flag is provided for the batch. + - is the [value] to use for bound variable i (of bound variable + if the 0x40 flag is used). + - is the [consistency] level for the operation. + - is only present if the 0x10 flag is set. In that case, + is the [consistency] level for the serial phase of + conditional updates. That consitency can only be either SERIAL or + LOCAL_SERIAL and if not present will defaults to SERIAL. This option will + be ignored for anything else other than a conditional update/insert. + + The server will respond with a RESULT message. + + +4.1.8. REGISTER + + Register this connection to receive some types of events. The body of the + message is a [string list] representing the event types to register for. See + section 4.2.6 for the list of valid event types. + + The response to a REGISTER message will be a READY message. + + Please note that if a client driver maintains multiple connections to a + Cassandra node and/or connections to multiple nodes, it is advised to + dedicate a handful of connections to receive events, but to *not* register + for events on all connections, as this would only result in receiving + multiple times the same event messages, wasting bandwidth. + + +4.2. Responses + + This section describes the content of the frame body for the different + responses. Please note that to make room for future evolution, clients should + support extra informations (that they should simply discard) to the one + described in this document at the end of the frame body. + +4.2.1. ERROR + + Indicates an error processing a request. The body of the message will be an + error code ([int]) followed by a [string] error message. Then, depending on + the exception, more content may follow. The error codes are defined in + Section 9, along with their additional content if any. + + +4.2.2. READY + + Indicates that the server is ready to process queries. This message will be + sent by the server either after a STARTUP message if no authentication is + required (if authentication is required, the server indicates readiness by + sending a AUTH_RESPONSE message). + + The body of a READY message is empty. + + +4.2.3. AUTHENTICATE + + Indicates that the server requires authentication, and which authentication + mechanism to use. + + The authentication is SASL based and thus consists of a number of server + challenges (AUTH_CHALLENGE, Section 4.2.7) followed by client responses + (AUTH_RESPONSE, Section 4.1.2). The initial exchange is however boostrapped + by an initial client response. The details of that exchange (including how + many challenge-response pairs are required) are specific to the authenticator + in use. The exchange ends when the server sends an AUTH_SUCCESS message or + an ERROR message. + + This message will be sent following a STARTUP message if authentication is + required and must be answered by a AUTH_RESPONSE message from the client. + + The body consists of a single [string] indicating the full class name of the + IAuthenticator in use. + + +4.2.4. SUPPORTED + + Indicates which startup options are supported by the server. This message + comes as a response to an OPTIONS message. + + The body of a SUPPORTED message is a [string multimap]. This multimap gives + for each of the supported STARTUP options, the list of supported values. + + +4.2.5. RESULT + + The result to a query (QUERY, PREPARE, EXECUTE or BATCH messages). + + The first element of the body of a RESULT message is an [int] representing the + `kind` of result. The rest of the body depends on the kind. The kind can be + one of: + 0x0001 Void: for results carrying no information. + 0x0002 Rows: for results to select queries, returning a set of rows. + 0x0003 Set_keyspace: the result to a `use` query. + 0x0004 Prepared: result to a PREPARE message. + 0x0005 Schema_change: the result to a schema altering query. + + The body for each kind (after the [int] kind) is defined below. + + +4.2.5.1. Void + + The rest of the body for a Void result is empty. It indicates that a query was + successful without providing more information. + + +4.2.5.2. Rows + + Indicates a set of rows. The rest of the body of a Rows result is: + + where: + - is composed of: + [][?...] + where: + - is an [int]. The bits of provides information on the + formatting of the remaining information. A flag is set if the bit + corresponding to its `mask` is set. Supported flags are, given their + mask: + 0x0001 Global_tables_spec: if set, only one table spec (keyspace + and table name) is provided as . If not + set, is not present. + 0x0002 Has_more_pages: indicates whether this is not the last + page of results and more should be retrieved. If set, the + will be present. The is a + [bytes] value that should be used in QUERY/EXECUTE to + continue paging and retrieve the remainder of the result for + this query (See Section 8 for more details). + 0x0004 No_metadata: if set, the is only composed of + these , the and optionally the + (depending on the Has_more_pages flag) but + no other information (so no nor ). + This will only ever be the case if this was requested + during the query (see QUERY and RESULT messages). + - is an [int] representing the number of columns selected + by the query that produced this result. It defines the number of + elements in and the number of elements for each row in . + - is present if the Global_tables_spec is set in + . It is composed of two [string] representing the + (unique) keyspace name and table name the columns belong to. + - specifies the columns returned in the query. There are + such column specifications that are composed of: + ()? + The initial and are two [string] and are only present + if the Global_tables_spec flag is not set. The is a + [string] and is an [option] that corresponds to the description + (what this description is depends a bit on the context: in results to + selects, this will be either the user chosen alias or the selection used + (often a colum name, but it can be a function call too). In results to + a PREPARE, this will be either the name of the corresponding bind variable + or the column name for the variable if it is "anonymous") and type of + the corresponding result. The option for is either a native + type (see below), in which case the option has no value, or a + 'custom' type, in which case the value is a [string] representing + the fully qualified class name of the type represented. Valid option + ids are: + 0x0000 Custom: the value is a [string], see above. + 0x0001 Ascii + 0x0002 Bigint + 0x0003 Blob + 0x0004 Boolean + 0x0005 Counter + 0x0006 Decimal + 0x0007 Double + 0x0008 Float + 0x0009 Int + 0x000B Timestamp + 0x000C Uuid + 0x000D Varchar + 0x000E Varint + 0x000F Timeuuid + 0x0010 Inet + 0x0011 Date + 0x0012 Time + 0x0013 Smallint + 0x0014 Tinyint + 0x0020 List: the value is an [option], representing the type + of the elements of the list. + 0x0021 Map: the value is two [option], representing the types of the + keys and values of the map + 0x0022 Set: the value is an [option], representing the type + of the elements of the set + 0x0030 UDT: the value is ... + where: + - is a [string] representing the keyspace name this + UDT is part of. + - is a [string] representing the UDT name. + - is a [short] representing the number of fields of + the UDT, and thus the number of pairs + following + - is a [string] representing the name of the + i_th field of the UDT. + - is an [option] representing the type of the + i_th field of the UDT. + 0x0031 Tuple: the value is ... where is a [short] + representing the number of values in the type, and + are [option] representing the type of the i_th component + of the tuple + + - is an [int] representing the number of rows present in this + result. Those rows are serialized in the part. + - is composed of ... where m is . + Each is composed of ... where n is + and where is a [bytes] representing the value + returned for the jth column of the ith row. In other words, + is composed of ( * ) [bytes]. + + +4.2.5.3. Set_keyspace + + The result to a `use` query. The body (after the kind [int]) is a single + [string] indicating the name of the keyspace that has been set. + + +4.2.5.4. Prepared + + The result to a PREPARE message. The body of a Prepared result is: + + where: + - is [short bytes] representing the prepared query ID. + - is composed of: + [...][?...] + where: + - is an [int]. The bits of provides information on the + formatting of the remaining information. A flag is set if the bit + corresponding to its `mask` is set. Supported masks and their flags + are: + 0x0001 Global_tables_spec: if set, only one table spec (keyspace + and table name) is provided as . If not + set, is not present. + - is an [int] representing the number of bind markers + in the prepared statement. It defines the number of + elements. + - is an [int] representing the number of + elements to follow. If this value is zero, at least one of the + partition key columns in the table that the statement acts on + did not have a corresponding bind marker (or the bind marker + was wrapped in a function call). + - is a short that represents the index of the bind marker + that corresponds to the partition key column in position i. + For example, a sequence of [2, 0, 1] indicates that the + table has three partition key columns; the full partition key + can be constructed by creating a composite of the values for + the bind markers at index 2, at index 0, and at index 1. + This allows implementations with token-aware routing to correctly + construct the partition key without needing to inspect table + metadata. + - is present if the Global_tables_spec is set in + . If present, it is composed of two [string]s. The first + [string] is the name of the keyspace that the statement acts on. + The second [string] is the name of the table that the columns + represented by the bind markers belong to. + - specifies the bind markers in the prepared statement. + There are such column specifications, each with the + following format: + ()? + The initial and are two [string] that are only + present if the Global_tables_spec flag is not set. The field + is a [string] that holds the name of the bind marker (if named), + or the name of the column, field, or expression that the bind marker + corresponds to (if the bind marker is "anonymous"). The + field is an [option] that represents the expected type of values for + the bind marker. See the Rows documentation (section 4.2.5.2) for + full details on the field. + + - is defined exactly the same as in the Rows + documentation (section 4.2.5.2). This describes the metadata for the + result set that will be returned when this prepared statement is executed. + Note that may be empty (have the No_metadata flag and + 0 columns, See section 4.2.5.2) and will be for any query that is not a + Select. In fact, there is never a guarantee that this will be non-empty, so + implementations should protect themselves accordingly. This result metadata + is an optimization that allows implementations to later execute the + prepared statement without requesting the metadata (see the Skip_metadata + flag in EXECUTE). Clients can safely discard this metadata if they do not + want to take advantage of that optimization. + + Note that the prepared query ID returned is global to the node on which the query + has been prepared. It can be used on any connection to that node + until the node is restarted (after which the query must be reprepared). + +4.2.5.5. Schema_change + + The result to a schema altering query (creation/update/drop of a + keyspace/table/index). The body (after the kind [int]) is the same + as the body for a "SCHEMA_CHANGE" event, so 3 strings: + + Please refer to section 4.2.6 below for the meaning of those fields. + + Note that a query to create or drop an index is considered to be a change + to the table the index is on. + + +4.2.6. EVENT + + An event pushed by the server. A client will only receive events for the + types it has REGISTERed to. The body of an EVENT message will start with a + [string] representing the event type. The rest of the message depends on the + event type. The valid event types are: + - "TOPOLOGY_CHANGE": events related to change in the cluster topology. + Currently, events are sent when new nodes are added to the cluster, and + when nodes are removed. The body of the message (after the event type) + consists of a [string] and an [inet], corresponding respectively to the + type of change ("NEW_NODE" or "REMOVED_NODE") followed by the address of + the new/removed node. + - "STATUS_CHANGE": events related to change of node status. Currently, + up/down events are sent. The body of the message (after the event type) + consists of a [string] and an [inet], corresponding respectively to the + type of status change ("UP" or "DOWN") followed by the address of the + concerned node. + - "SCHEMA_CHANGE": events related to schema change. After the event type, + the rest of the message will be where: + - is a [string] representing the type of changed involved. + It will be one of "CREATED", "UPDATED" or "DROPPED". + - is a [string] that can be one of "KEYSPACE", "TABLE", "TYPE", + "FUNCTION" or "AGGREGATE" and describes what has been modified + ("TYPE" stands for modifications related to user types, "FUNCTION" + for modifications related to user defined functions, "AGGREGATE" + for modifications related to user defined aggregates). + - depends on the preceding : + - If is "KEYSPACE", then will be a single [string] + representing the keyspace changed. + - If is "TABLE" or "TYPE", then + will be 2 [string]: the first one will be the keyspace + containing the affected object, and the second one will be the name + of said affected object (either the table, user type, function, or + aggregate name). + - If is "FUNCTION" or "AGGREGATE", multiple arguments follow: + - [string] keyspace containing the user defined function / aggregate + - [string] the function/aggregate name + - [string list] one string for each argument type (as CQL type) + + All EVENT messages have a streamId of -1 (Section 2.3). + + Please note that "NEW_NODE" and "UP" events are sent based on internal Gossip + communication and as such may be sent a short delay before the binary + protocol server on the newly up node is fully started. Clients are thus + advised to wait a short time before trying to connect to the node (1 second + should be enough), otherwise they may experience a connection refusal at + first. + +4.2.7. AUTH_CHALLENGE + + A server authentication challenge (see AUTH_RESPONSE (Section 4.1.2) for more + details). + + The body of this message is a single [bytes] token. The details of what this + token contains (and when it can be null/empty, if ever) depends on the actual + authenticator used. + + Clients are expected to answer the server challenge with an AUTH_RESPONSE + message. + +4.2.8. AUTH_SUCCESS + + Indicates the success of the authentication phase. See Section 4.2.3 for more + details. + + The body of this message is a single [bytes] token holding final information + from the server that the client may require to finish the authentication + process. What that token contains and whether it can be null depends on the + actual authenticator used. + + +5. Compression + + Frame compression is supported by the protocol, but then only the frame body + is compressed (the frame header should never be compressed). + + Before being used, client and server must agree on a compression algorithm to + use, which is done in the STARTUP message. As a consequence, a STARTUP message + must never be compressed. However, once the STARTUP frame has been received + by the server, messages can be compressed (including the response to the STARTUP + request). Frames do not have to be compressed, however, even if compression has + been agreed upon (a server may only compress frames above a certain size at its + discretion). A frame body should be compressed if and only if the compressed + flag (see Section 2.2) is set. + + As of version 2 of the protocol, the following compressions are available: + - lz4 (https://code.google.com/p/lz4/). In that, note that the first four bytes + of the body will be the uncompressed length (followed by the compressed + bytes). + - snappy (https://code.google.com/p/snappy/). This compression might not be + available as it depends on a native lib (server-side) that might not be + avaivable on some installations. + + +6. Data Type Serialization Formats + + This sections describes the serialization formats for all CQL data types + supported by Cassandra through the native protocol. These serialization + formats should be used by client drivers to encode values for EXECUTE + messages. Cassandra will use these formats when returning values in + RESULT messages. + + All values are represented as [bytes] in EXECUTE and RESULT messages. + The [bytes] format includes an int prefix denoting the length of the value. + For that reason, the serialization formats described here will not include + a length component. + + For legacy compatibility reasons, note that most non-string types support + "empty" values (i.e. a value with zero length). An empty value is distinct + from NULL, which is encoded with a negative length. + + As with the rest of the native protocol, all encodings are big-endian. + +6.1. ascii + + A sequence of bytes in the ASCII range [0, 127]. Bytes with values outside of + this range will result in a validation error. + +6.2 bigint + + An eight-byte two's complement integer. + +6.3 blob + + Any sequence of bytes. + +6.4 boolean + + A single byte. A value of 0 denotes "false"; any other value denotes "true". + (However, it is recommended that a value of 1 be used to represent "true".) + +6.5 date + + An unsigned integer representing days with epoch centered at 2^31. + (unix epoch January 1st, 1970). + A few examples: + 0: -5877641-06-23 + 2^31: 1970-1-1 + 2^32: 5881580-07-11 + +6.6 decimal + + The decimal format represents an arbitrary-precision number. It contains an + [int] "scale" component followed by a varint encoding (see section 6.17) + of the unscaled value. The encoded value represents "E<-scale>". + In other words, " * 10 ^ (-1 * )". + +6.7 double + + An 8 byte floating point number in the IEEE 754 binary64 format. + +6.8 float + + A 4 byte floating point number in the IEEE 754 binary32 format. + +6.9 inet + + A 4 byte or 16 byte sequence denoting an IPv4 or IPv6 address, respectively. + +6.10 int + + A 4 byte two's complement integer. + +6.11 list + + A [int] n indicating the number of elements in the list, followed by n + elements. Each element is [bytes] representing the serialized value. + +6.12 map + + A [int] n indicating the number of key/value pairs in the map, followed by + n entries. Each entry is composed of two [bytes] representing the key + and value. + +6.13 set + + A [int] n indicating the number of elements in the set, followed by n + elements. Each element is [bytes] representing the serialized value. + +6.14 smallint + + A 2 byte two's complement integer. + +6.15 text + + A sequence of bytes conforming to the UTF-8 specifications. + +6.16 time + + An 8 byte two's complement long representing nanoseconds since midnight. + Valid values are in the range 0 to 86399999999999 + +6.17 timestamp + + An 8 byte two's complement integer representing a millisecond-precision + offset from the unix epoch (00:00:00, January 1st, 1970). Negative values + represent a negative offset from the epoch. + +6.18 timeuuid + + A 16 byte sequence representing a version 1 UUID as defined by RFC 4122. + +6.19 tinyint + + A 1 byte two's complement integer. + +6.20 tuple + + A sequence of [bytes] values representing the items in a tuple. The encoding + of each element depends on the data type for that position in the tuple. + Null values may be represented by using length -1 for the [bytes] + representation of an element. + +6.21 uuid + + A 16 byte sequence representing any valid UUID as defined by RFC 4122. + +6.22 varchar + + An alias of the "text" type. + +6.23 varint + + A variable-length two's complement encoding of a signed integer. + + The following examples may help implementors of this spec: + + Value | Encoding + ------|--------- + 0 | 0x00 + 1 | 0x01 + 127 | 0x7F + 128 | 0x0080 + 129 | 0x0081 + -1 | 0xFF + -128 | 0x80 + -129 | 0xFF7F + + Note that positive numbers must use a most-significant byte with a value + less than 0x80, because a most-significant bit of 1 indicates a negative + value. Implementors should pad positive values that have a MSB >= 0x80 + with a leading 0x00 byte. + + +7. User Defined Types + + This section describes the serialization format for User defined types (UDT), + as described in section 4.2.5.2. + + A UDT value is composed of successive [bytes] values, one for each field of the UDT + value (in the order defined by the type). A UDT value will generally have one value + for each field of the type it represents, but it is allowed to have less values than + the type has fields. + + +8. Result paging + + The protocol allows for paging the result of queries. For that, the QUERY and + EXECUTE messages have a value that indicate the desired + page size in CQL3 rows. + + If a positive value is provided for , the result set of the + RESULT message returned for the query will contain at most the + first rows of the query result. If that first page of results + contains the full result set for the query, the RESULT message (of kind `Rows`) + will have the Has_more_pages flag *not* set. However, if some results are not + part of the first response, the Has_more_pages flag will be set and the result + will contain a value. In that case, the value + should be used in a QUERY or EXECUTE message (that has the *same* query as + the original one or the behavior is undefined) to retrieve the next page of + results. + + Only CQL3 queries that return a result set (RESULT message with a Rows `kind`) + support paging. For other type of queries, the value is + ignored. + + Note to client implementors: + - While can be as low as 1, it will likely be detrimental + to performance to pick a value too low. A value below 100 is probably too + low for most use cases. + - Clients should not rely on the actual size of the result set returned to + decide if there are more results to fetch or not. Instead, they should always + check the Has_more_pages flag (unless they did not enable paging for the query + obviously). Clients should also not assert that no result will have more than + results. While the current implementation always respects + the exact value of , we reserve the right to return + slightly smaller or bigger pages in the future for performance reasons. + - The is specific to a protocol version and drivers should not + send a returned by a node using the protocol v3 to query a node + using the protocol v4 for instance. + + +9. Error codes + + Let us recall that an ERROR message is composed of [...] + (see 4.2.1 for details). The supported error codes, as well as any additional + information the message may contain after the are described below: + 0x0000 Server error: something unexpected happened. This indicates a + server-side bug. + 0x000A Protocol error: some client message triggered a protocol + violation (for instance a QUERY message is sent before a STARTUP + one has been sent) + 0x0100 Authentication error: authentication was required and failed. The + possible reason for failing depends on the authenticator in use, + which may or may not include more detail in the accompanying + error message. + 0x1000 Unavailable exception. The rest of the ERROR message body will be + + where: + is the [consistency] level of the query that triggered + the exception. + is an [int] representing the number of nodes that + should be alive to respect + is an [int] representing the number of replicas that + were known to be alive when the request had been + processed (since an unavailable exception has been + triggered, there will be < ) + 0x1001 Overloaded: the request cannot be processed because the + coordinator node is overloaded + 0x1002 Is_bootstrapping: the request was a read request but the + coordinator node is bootstrapping + 0x1003 Truncate_error: error during a truncation error. + 0x1100 Write_timeout: Timeout exception during a write request. The rest + of the ERROR message body will be + + where: + is the [consistency] level of the query having triggered + the exception. + is an [int] representing the number of nodes having + acknowledged the request. + is an [int] representing the number of replicas whose + acknowledgement is required to achieve . + is a [string] that describe the type of the write + that timed out. The value of that string can be one + of: + - "SIMPLE": the write was a non-batched + non-counter write. + - "BATCH": the write was a (logged) batch write. + If this type is received, it means the batch log + has been successfully written (otherwise a + "BATCH_LOG" type would have been sent instead). + - "UNLOGGED_BATCH": the write was an unlogged + batch. No batch log write has been attempted. + - "COUNTER": the write was a counter write + (batched or not). + - "BATCH_LOG": the timeout occurred during the + write to the batch log when a (logged) batch + write was requested. + 0x1200 Read_timeout: Timeout exception during a read request. The rest + of the ERROR message body will be + + where: + is the [consistency] level of the query having triggered + the exception. + is an [int] representing the number of nodes having + answered the request. + is an [int] representing the number of replicas whose + response is required to achieve . Please note that + it is possible to have >= if + is false. Also in the (unlikely) + case where is achieved but the coordinator node + times out while waiting for read-repair acknowledgement. + is a single byte. If its value is 0, it means + the replica that was asked for data has not + responded. Otherwise, the value is != 0. + 0x1300 Read_failure: A non-timeout exception during a read request. The rest + of the ERROR message body will be + + where: + is the [consistency] level of the query having triggered + the exception. + is an [int] representing the number of nodes having + answered the request. + is an [int] representing the number of replicas whose + acknowledgement is required to achieve . + is an [int] representing the number of nodes that + experience a failure while executing the request. + is a single byte. If its value is 0, it means + the replica that was asked for data had not + responded. Otherwise, the value is != 0. + 0x1400 Function_failure: A (user defined) function failed during execution. + The rest of the ERROR message body will be + + where: + is the keyspace [string] of the failed function + is the name [string] of the failed function + [string list] one string for each argument type (as CQL type) of the failed function + 0x1500 Write_failure: A non-timeout exception during a write request. The rest + of the ERROR message body will be + + where: + is the [consistency] level of the query having triggered + the exception. + is an [int] representing the number of nodes having + answered the request. + is an [int] representing the number of replicas whose + acknowledgement is required to achieve . + is an [int] representing the number of nodes that + experience a failure while executing the request. + is a [string] that describes the type of the write + that failed. The value of that string can be one + of: + - "SIMPLE": the write was a non-batched + non-counter write. + - "BATCH": the write was a (logged) batch write. + If this type is received, it means the batch log + has been successfully written (otherwise a + "BATCH_LOG" type would have been sent instead). + - "UNLOGGED_BATCH": the write was an unlogged + batch. No batch log write has been attempted. + - "COUNTER": the write was a counter write + (batched or not). + - "BATCH_LOG": the failure occured during the + write to the batch log when a (logged) batch + write was requested. + + 0x2000 Syntax_error: The submitted query has a syntax error. + 0x2100 Unauthorized: The logged user doesn't have the right to perform + the query. + 0x2200 Invalid: The query is syntactically correct but invalid. + 0x2300 Config_error: The query is invalid because of some configuration issue + 0x2400 Already_exists: The query attempted to create a keyspace or a + table that was already existing. The rest of the ERROR message + body will be where: + is a [string] representing either the keyspace that + already exists, or the keyspace in which the table that + already exists is. +
is a [string] representing the name of the table that + already exists. If the query was attempting to create a + keyspace,
will be present but will be the empty + string. + 0x2500 Unprepared: Can be thrown while a prepared statement tries to be + executed if the provided prepared statement ID is not known by + this host. The rest of the ERROR message body will be [short + bytes] representing the unknown ID. + +10. Changes from v4 + + * Beta protocol flag for v5 native protocol is added (Section 2.2) http://git-wip-us.apache.org/repos/asf/cassandra/blob/d9322a17/src/java/org/apache/cassandra/cql3/ResultSet.java ---------------------------------------------------------------------- diff --git a/src/java/org/apache/cassandra/cql3/ResultSet.java b/src/java/org/apache/cassandra/cql3/ResultSet.java index 9010b20..e9a0f42 100644 --- a/src/java/org/apache/cassandra/cql3/ResultSet.java +++ b/src/java/org/apache/cassandra/cql3/ResultSet.java @@ -590,7 +590,7 @@ public class ResultSet size += CBUtil.sizeOfString(m.names.get(0).cfName); } - if (m.partitionKeyBindIndexes != null && version >= 4) + if (m.partitionKeyBindIndexes != null && version >= Server.VERSION_4) size += 4 + 2 * m.partitionKeyBindIndexes.length; for (ColumnSpecification name : m.names) http://git-wip-us.apache.org/repos/asf/cassandra/blob/d9322a17/src/java/org/apache/cassandra/transport/CBUtil.java ---------------------------------------------------------------------- diff --git a/src/java/org/apache/cassandra/transport/CBUtil.java b/src/java/org/apache/cassandra/transport/CBUtil.java index 43f4bbd..570fd6b 100644 --- a/src/java/org/apache/cassandra/transport/CBUtil.java +++ b/src/java/org/apache/cassandra/transport/CBUtil.java @@ -393,7 +393,7 @@ public abstract class CBUtil int length = cb.readInt(); if (length < 0) { - if (protocolVersion < 4) // backward compatibility for pre-version 4 + if (protocolVersion < Server.VERSION_4) // backward compatibility for pre-version 4 return null; if (length == -1) return null; http://git-wip-us.apache.org/repos/asf/cassandra/blob/d9322a17/src/java/org/apache/cassandra/transport/Event.java ---------------------------------------------------------------------- diff --git a/src/java/org/apache/cassandra/transport/Event.java b/src/java/org/apache/cassandra/transport/Event.java index 3c45c33..232d817 100644 --- a/src/java/org/apache/cassandra/transport/Event.java +++ b/src/java/org/apache/cassandra/transport/Event.java @@ -270,7 +270,7 @@ public abstract class Event public static SchemaChange deserializeEvent(ByteBuf cb, int version) { Change change = CBUtil.readEnumValue(Change.class, cb); - if (version >= 3) + if (version >= Server.VERSION_3) { Target target = CBUtil.readEnumValue(Target.class, cb); String keyspace = CBUtil.readString(cb); @@ -293,7 +293,7 @@ public abstract class Event { if (target == Target.FUNCTION || target == Target.AGGREGATE) { - if (version >= 4) + if (version >= Server.VERSION_4) { // available since protocol version 4 CBUtil.writeEnumValue(change, dest); @@ -314,7 +314,7 @@ public abstract class Event return; } - if (version >= 3) + if (version >= Server.VERSION_3) { CBUtil.writeEnumValue(change, dest); CBUtil.writeEnumValue(target, dest); @@ -345,13 +345,13 @@ public abstract class Event { if (target == Target.FUNCTION || target == Target.AGGREGATE) { - if (version >= 4) + if (version >= Server.VERSION_3) return CBUtil.sizeOfEnumValue(change) + CBUtil.sizeOfEnumValue(target) + CBUtil.sizeOfString(keyspace) + CBUtil.sizeOfString(name) + CBUtil.sizeOfStringList(argTypes); - if (version >= 3) + if (version >= Server.VERSION_3) return CBUtil.sizeOfEnumValue(Change.UPDATED) + CBUtil.sizeOfEnumValue(Target.KEYSPACE) + CBUtil.sizeOfString(keyspace); @@ -360,7 +360,7 @@ public abstract class Event + CBUtil.sizeOfString(""); } - if (version >= 3) + if (version >= Server.VERSION_3) { int size = CBUtil.sizeOfEnumValue(change) + CBUtil.sizeOfEnumValue(target) http://git-wip-us.apache.org/repos/asf/cassandra/blob/d9322a17/src/java/org/apache/cassandra/transport/Frame.java ---------------------------------------------------------------------- diff --git a/src/java/org/apache/cassandra/transport/Frame.java b/src/java/org/apache/cassandra/transport/Frame.java index d0d4aee..f2f6174 100644 --- a/src/java/org/apache/cassandra/transport/Frame.java +++ b/src/java/org/apache/cassandra/transport/Frame.java @@ -85,11 +85,6 @@ public class Frame public final int streamId; public final Message.Type type; - private Header(int version, int flags, int streamId, Message.Type type) - { - this(version, Flag.deserialize(flags), streamId, type); - } - private Header(int version, EnumSet flags, int streamId, Message.Type type) { this.version = version; @@ -104,7 +99,8 @@ public class Frame COMPRESSED, TRACING, CUSTOM_PAYLOAD, - WARNING; + WARNING, + USE_BETA; private static final Flag[] ALL_VALUES = values(); @@ -174,16 +170,26 @@ public class Frame int firstByte = buffer.getByte(idx++); Message.Direction direction = Message.Direction.extractFromVersion(firstByte); int version = firstByte & PROTOCOL_VERSION_MASK; - if (version < Server.MIN_SUPPORTED_VERSION || version > Server.CURRENT_VERSION) - throw new ProtocolException(String.format("Invalid or unsupported protocol version (%d); the lowest supported version is %d and the greatest is %d", - version, Server.MIN_SUPPORTED_VERSION, Server.CURRENT_VERSION), - version); + if (version < Server.MIN_SUPPORTED_VERSION) + throw new ProtocolException(protocolVersionExceptionMessage(version), version); // Wait until we have the complete header if (readableBytes < Header.LENGTH) return; int flags = buffer.getByte(idx++); + EnumSet decodedFlags = Header.Flag.deserialize(flags); + if (version > Server.CURRENT_VERSION) + { + if (version == Server.BETA_VERSION) + { + if (!decodedFlags.contains(Header.Flag.USE_BETA)) + throw new ProtocolException(String.format("Beta version of the protocol used (%d), but USE_BETA flag is unset", + version)); + } + else + throw new ProtocolException(protocolVersionExceptionMessage(version)); + } int streamId = buffer.getShort(idx); idx += 2; @@ -242,7 +248,13 @@ public class Frame streamId); } - results.add(new Frame(new Header(version, flags, streamId, type), body)); + results.add(new Frame(new Header(version, decodedFlags, streamId, type), body)); + } + + private static String protocolVersionExceptionMessage(int version) + { + return String.format("Invalid or unsupported protocol version (%d); the lowest supported version is %d and the greatest is %d", + version, Server.MIN_SUPPORTED_VERSION, Server.CURRENT_VERSION); } private void fail() http://git-wip-us.apache.org/repos/asf/cassandra/blob/d9322a17/src/java/org/apache/cassandra/transport/Message.java ---------------------------------------------------------------------- diff --git a/src/java/org/apache/cassandra/transport/Message.java b/src/java/org/apache/cassandra/transport/Message.java index 7bfa194..66e0014 100644 --- a/src/java/org/apache/cassandra/transport/Message.java +++ b/src/java/org/apache/cassandra/transport/Message.java @@ -322,6 +322,8 @@ public abstract class Message int version = connection == null ? Server.CURRENT_VERSION : connection.getVersion(); EnumSet flags = EnumSet.noneOf(Frame.Header.Flag.class); + if (version == Server.BETA_VERSION) + flags.add(Frame.Header.Flag.USE_BETA); Codec codec = (Codec)message.type.codec; try http://git-wip-us.apache.org/repos/asf/cassandra/blob/d9322a17/src/java/org/apache/cassandra/transport/ProtocolException.java ---------------------------------------------------------------------- diff --git a/src/java/org/apache/cassandra/transport/ProtocolException.java b/src/java/org/apache/cassandra/transport/ProtocolException.java index a589e9b..9d8c270 100644 --- a/src/java/org/apache/cassandra/transport/ProtocolException.java +++ b/src/java/org/apache/cassandra/transport/ProtocolException.java @@ -35,6 +35,8 @@ public class ProtocolException extends RuntimeException implements TransportExce public ProtocolException(String msg, Integer attemptedLowProtocolVersion) { super(msg); + assert attemptedLowProtocolVersion == null || attemptedLowProtocolVersion < Server.MIN_SUPPORTED_VERSION; + this.attemptedLowProtocolVersion = attemptedLowProtocolVersion; } http://git-wip-us.apache.org/repos/asf/cassandra/blob/d9322a17/src/java/org/apache/cassandra/transport/Server.java ---------------------------------------------------------------------- diff --git a/src/java/org/apache/cassandra/transport/Server.java b/src/java/org/apache/cassandra/transport/Server.java index 388fca0..36ad186 100644 --- a/src/java/org/apache/cassandra/transport/Server.java +++ b/src/java/org/apache/cassandra/transport/Server.java @@ -66,7 +66,9 @@ public class Server implements CassandraDaemon.Server public static final int VERSION_3 = 3; public static final int VERSION_4 = 4; + public static final int VERSION_5 = 5; public static final int CURRENT_VERSION = VERSION_4; + public static final int BETA_VERSION = VERSION_5; public static final int MIN_SUPPORTED_VERSION = VERSION_3; private final ConnectionTracker connectionTracker = new ConnectionTracker(); http://git-wip-us.apache.org/repos/asf/cassandra/blob/d9322a17/src/java/org/apache/cassandra/transport/SimpleClient.java ---------------------------------------------------------------------- diff --git a/src/java/org/apache/cassandra/transport/SimpleClient.java b/src/java/org/apache/cassandra/transport/SimpleClient.java index 6e20cfa..4d8a30b 100644 --- a/src/java/org/apache/cassandra/transport/SimpleClient.java +++ b/src/java/org/apache/cassandra/transport/SimpleClient.java @@ -92,10 +92,7 @@ public class SimpleClient implements Closeable public SimpleClient(String host, int port, int version, ClientEncryptionOptions encryptionOptions) { - this.host = host; - this.port = port; - this.version = version; - this.encryptionOptions = encryptionOptions; + this(host, port, version, false, encryptionOptions); } public SimpleClient(String host, int port, ClientEncryptionOptions encryptionOptions) @@ -108,6 +105,17 @@ public class SimpleClient implements Closeable this(host, port, version, new ClientEncryptionOptions()); } + public SimpleClient(String host, int port, int version, boolean useBeta, ClientEncryptionOptions encryptionOptions) + { + this.host = host; + this.port = port; + if (version == Server.BETA_VERSION && !useBeta) + throw new IllegalArgumentException(String.format("Beta version of server used (%s), but USE_BETA flag is not set", version)); + + this.version = version; + this.encryptionOptions = encryptionOptions; + } + public SimpleClient(String host, int port) { this(host, port, new ClientEncryptionOptions()); http://git-wip-us.apache.org/repos/asf/cassandra/blob/d9322a17/src/java/org/apache/cassandra/transport/messages/BatchMessage.java ---------------------------------------------------------------------- diff --git a/src/java/org/apache/cassandra/transport/messages/BatchMessage.java b/src/java/org/apache/cassandra/transport/messages/BatchMessage.java index bd2423e..9d1047f 100644 --- a/src/java/org/apache/cassandra/transport/messages/BatchMessage.java +++ b/src/java/org/apache/cassandra/transport/messages/BatchMessage.java @@ -84,7 +84,7 @@ public class BatchMessage extends Message.Request CBUtil.writeValueList(msg.values.get(i), dest); } - if (version < 3) + if (version < Server.VERSION_3) CBUtil.writeConsistencyLevel(msg.options.getConsistency(), dest); else QueryOptions.codec.encode(msg.options, dest, version); @@ -102,7 +102,7 @@ public class BatchMessage extends Message.Request size += CBUtil.sizeOfValueList(msg.values.get(i)); } - size += version < 3 + size += version < Server.VERSION_3 ? CBUtil.sizeOfConsistencyLevel(msg.options.getConsistency()) : QueryOptions.codec.encodedSize(msg.options, version); return size; http://git-wip-us.apache.org/repos/asf/cassandra/blob/d9322a17/test/unit/org/apache/cassandra/service/ProtocolBetaVersionTest.java ---------------------------------------------------------------------- diff --git a/test/unit/org/apache/cassandra/service/ProtocolBetaVersionTest.java b/test/unit/org/apache/cassandra/service/ProtocolBetaVersionTest.java new file mode 100644 index 0000000..7eae7bb --- /dev/null +++ b/test/unit/org/apache/cassandra/service/ProtocolBetaVersionTest.java @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.cassandra.service; + +import org.junit.BeforeClass; +import org.junit.Test; + +import org.apache.cassandra.config.*; +import org.apache.cassandra.cql3.*; +import org.apache.cassandra.transport.*; +import org.apache.cassandra.transport.messages.*; + +import static junit.framework.Assert.assertEquals; +import static junit.framework.Assert.fail; + +public class ProtocolBetaVersionTest extends CQLTester +{ + @BeforeClass + public static void setUp() + { + requireNetwork(); + DatabaseDescriptor.setBatchSizeWarnThresholdInKB(1); + } + + @Test + public void testProtocolBetaVersion() throws Exception + { + createTable("CREATE TABLE %s (pk int PRIMARY KEY, v int)"); + + try (SimpleClient client = new SimpleClient(nativeAddr.getHostAddress(), nativePort, Server.BETA_VERSION, true, new EncryptionOptions.ClientEncryptionOptions())) + { + client.connect(false); + for (int i = 0; i < 10; i++) + { + QueryMessage query = new QueryMessage(String.format("INSERT INTO %s.%s (pk, v) VALUES (%s, %s)", + KEYSPACE, + currentTable(), + i, i), QueryOptions.DEFAULT); + client.execute(query); + } + + QueryMessage query = new QueryMessage(String.format("SELECT * FROM %s.%s", + KEYSPACE, + currentTable()), QueryOptions.DEFAULT); + ResultMessage.Rows resp = (ResultMessage.Rows) client.execute(query); + assertEquals(10, resp.result.size()); + } + catch (Exception e) + { + fail("No exceptions should've been thrown: " + e.getMessage()); + } + } + + @Test + public void unforcedProtocolVersionTest() throws Exception + { + try + { + SimpleClient client = new SimpleClient(nativeAddr.getHostAddress(), nativePort, Server.BETA_VERSION, false, new EncryptionOptions.ClientEncryptionOptions()); + client.connect(false); + fail("Exception should have been thrown"); + } + catch (Exception e) + { + assertEquals("Beta version of server used (5), but USE_BETA flag is not set", + e.getMessage()); + } + } +} + http://git-wip-us.apache.org/repos/asf/cassandra/blob/d9322a17/test/unit/org/apache/cassandra/transport/ProtocolErrorTest.java ---------------------------------------------------------------------- diff --git a/test/unit/org/apache/cassandra/transport/ProtocolErrorTest.java b/test/unit/org/apache/cassandra/transport/ProtocolErrorTest.java index 599087c..2be7b08 100644 --- a/test/unit/org/apache/cassandra/transport/ProtocolErrorTest.java +++ b/test/unit/org/apache/cassandra/transport/ProtocolErrorTest.java @@ -34,9 +34,9 @@ public class ProtocolErrorTest { @Test public void testInvalidProtocolVersion() throws Exception { - // test using a protocol version higher than the current version - testInvalidProtocolVersion(Server.CURRENT_VERSION + 1); - // test using a protocol version lower than the lowest version + // test using a protocol 2 version higher than the current version (1 version higher is current beta) + testInvalidProtocolVersion(Server.CURRENT_VERSION + 2); // + // test using a protocol version lower than the lowest version testInvalidProtocolVersion(Server.MIN_SUPPORTED_VERSION - 1); }