Return-Path: Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: (qmail 54679 invoked from network); 22 Apr 2010 06:17:20 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 22 Apr 2010 06:17:20 -0000 Received: (qmail 41513 invoked by uid 500); 22 Apr 2010 06:17:20 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 41494 invoked by uid 500); 22 Apr 2010 06:17:19 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 41486 invoked by uid 500); 22 Apr 2010 06:17:19 -0000 Delivered-To: apmail-incubator-cassandra-commits@incubator.apache.org Received: (qmail 41482 invoked by uid 99); 22 Apr 2010 06:17:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Apr 2010 06:17:18 +0000 X-ASF-Spam-Status: No, hits=-1388.6 required=10.0 tests=ALL_TRUSTED,AWL X-Spam-Check-By: apache.org Received: from [140.211.11.130] (HELO eos.apache.org) (140.211.11.130) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Apr 2010 06:17:17 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id D59DD17D15; Thu, 22 Apr 2010 06:16:56 +0000 (GMT) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Apache Wiki To: Apache Wiki Date: Thu, 22 Apr 2010 06:16:56 -0000 Message-ID: <20100422061656.13633.57176@eos.apache.org> Subject: =?utf-8?q?=5BCassandra_Wiki=5D_Update_of_=22API07=22_by_ToddBlose?= Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for= change notification. The "API07" page has been changed by ToddBlose. http://wiki.apache.org/cassandra/API07 -------------------------------------------------- New page: ## page was copied from API =3D=3D Overview =3D=3D The Cassandra Thrift API changed between [[API03|0.3]], [[API04|0.4]], [[AP= I|0.5]] and 0.6; this document explains the 0.6 version. Cassandra's client API is built entirely on top of Thrift. It should be not= ed that these documents mention default values, but these are not generated= in all of the languages that Thrift supports. Full examples of using Cass= andra from Thrift, including setup boilerplate, are found on ThriftExamples= . Higher-level clients are linked from ClientOptions. '''WARNING:''' Some SQL/RDBMS terms are used in this documentation for anal= ogy purposes. They should be thought of as just that; analogies. There are = few similarities between how data is managed in a traditional RDBMS and Cas= sandra. Please see DataModel for more information. =3D=3D Terminology / Abbreviations =3D=3D Keyspace:: Contains multiple Column Families. CF:: !ColumnFamily. SCF:: !ColumnFamily of type "Super". Key:: A unique string that identifies a row in a CF. For clarity, rows ar= e always identified by keys; columns are identified by names. Note that Th= rift's Java code [i.e., Cassandra server] assumes that Strings are always e= ncoded as UTF-8, but if you are using a non-Java client, you may need to ma= nually encode non-ascii strings as utf8 first. (This is the major place Th= rift does not support interoperability between different platforms well.) Column:: A tuple of name, value, and timestamp; names are unique within ro= ws. =3D=3D Exceptions =3D=3D NotFoundException:: A specific column was requested that does not exist. InvalidRequestException:: Invalid request could mean keyspace or column fa= mily does not exist, required parameters are missing, or a parameter is mal= formed. `why` contains an associated error message. UnavailableException:: Not all the replicas required could be created and/= or read. TimedOutException:: The node responsible for the write or read did not res= pond during the rpc interval specified in your configuration (default 10s).= This can happen if the request is too large, the node is oversaturated wi= th requests, or the node is down but the failure detector has not yet reali= zed it (usually this takes < 30s). TApplicationException:: Internal server error or invalid Thrift method (po= ssible if you are using an older version of a Thrift client with a newer bu= ild of the Cassandra server). AuthenticationException:: Invalid authentication request (user does not ex= ist or credentials invalid) AuthorizationException:: Invalid authorization request (user does not have= access to keyspace) =3D=3D Structures =3D=3D =3D=3D=3D ConsistencyLevel =3D=3D=3D The `ConsistencyLevel` is an `enum` that controls both read and write behav= ior based on `` in your `storage-conf.xml`. The differen= t consistency levels have different meanings, depending on if you're doing = a write or read operation. Note that if `W` + `R` > `ReplicationFactor`, w= here W is the number of nodes to block for on write, and R the number to bl= ock for on reads, you will have strongly consistent behavior; that is, read= ers will always see the most recent write. Of these, the most interesting = is to do `QUORUM` reads and writes, which gives you consistency while still= allowing availability in the face of node failures up to half of `Replicat= ionFactor`. Of course if latency is more important than consistency then y= ou can use lower values for either or both. All discussion of "nodes" here refers to nodes responsible for holding data= for the given key; "surrogate" nodes involved in HintedHandoff do not coun= t towards achieving the requested !ConsistencyLevel. =3D=3D=3D=3D Write =3D=3D=3D=3D ||'''Level''' ||'''Behavior''' || ||`ZERO` ||Ensure nothing. A write happens asynchronously in background || ||`ANY` ||Ensure that the write has been written to at least 1 node, includ= ing hinted recipients. || ||`ONE` ||Ensure that the write has been written to at least 1 node's commi= t log and memory table before responding to the client. || ||`QUORUM` ||Ensure that the write has been written to `= / 2 + 1` nodes before responding to the client. || ||`ALL` ||Ensure that the write is written to all `` nod= es before responding to the client. Any unresponsive nodes will fail the o= peration. || =3D=3D=3D=3D Read =3D=3D=3D=3D ||'''Level''' ||'''Behavior''' || ||`ZERO` ||Not supported, because it doesn't make sense. || ||`ANY` ||Not supported. You probably want ONE instead. || ||`ONE` ||Will return the record returned by the first node to respond. A c= onsistency check is always done in a background thread to fix any consisten= cy issues when `ConsistencyLevel.ONE` is used. This means subsequent calls = will have correct data even if the initial read gets an older value. (This= is called `read repair`.) || ||`QUORUM` ||Will query all nodes and return the record with the most recen= t timestamp once it has at least a majority of replicas reported. Again, t= he remaining replicas will be checked in the background. || ||`ALL` ||Will query all nodes and return the record with the most recent t= imestamp once all nodes have replied. Any unresponsive nodes will fail the= operation. || '''Note: '''Different language toolkits may have their own Consistency Leve= l defaults as well. To ensure the desired Consistency Level, you should alw= ays explicitly set the Consistency Level. =3D=3D=3D ColumnOrSuperColumn =3D=3D=3D Due to the lack of inheritance in Thrift, `Column` and `SuperColumn` struct= ures are aggregated by the `ColumnOrSuperColumn` structure. This is used wh= erever either a `Column` or `SuperColumn` would normally be expected. If the underlying column is a `Column`, it will be contained within the `co= lumn` attribute. If the underlying column is a `SuperColumn`, it will be co= ntained within the `super_column` attribute. The two are mutually exclusive= - i.e. only one may be populated. ||'''Attribute''' ||'''Type''' ||'''Default''' ||'''Required''' ||'''Descri= ption''' || ||`column` ||`Column` ||n/a ||N ||The `Column` if this `ColumnOrSuperColumn= ` is aggregating a `Column`. || ||`super_column` ||`SuperColumn` ||n/a ||N ||The `SuperColumn` if this `Col= umnOrSuperColumn` is aggregating a `SuperColumn` || =3D=3D=3D Column =3D=3D=3D The `Column` is a triplet of a name, value and timestamp. As described abov= e, `Column` names are unique within a row. Timestamps are arbitrary - they = can be any integer you specify, however they must be consistent across your= application. It is recommended to use a timestamp value with a fine granul= arity, such as milliseconds since the UNIX epoch. See DataModel for more in= formation. ||'''Attribute''' ||'''Type''' ||'''Default''' ||'''Required''' ||'''Descri= ption''' || ||`name` ||`binary` ||n/a ||Y ||The name of the `Column`. || ||`value` ||`binary` ||n/a ||Y ||The value of the `Column`. || ||`timestamp` ||`i64` ||n/a ||Y ||The timestamp of the `Column`. || =3D=3D=3D SuperColumn =3D=3D=3D A `SuperColumn` contains no data itself, but instead stores another level o= f `Columns` below the key. See DataModel for more details on what `SuperCol= umns` are and how they should be used. ||'''Attribute''' ||'''Type''' ||'''Default''' ||'''Required''' ||'''Descri= ption''' || ||`name` ||`binary` ||n/a ||Y ||The name of the `SuperColumn`. || ||`columns` ||`list` ||n/a ||Y ||The `Columns` within the `SuperCol= umn`. || =3D=3D=3D ColumnPath =3D=3D=3D The `ColumnPath` is the path to a single column in Cassandra. It might make= sense to think of `ColumnPath` and `ColumnParent` in terms of a directory = structure. ||'''Attribute''' ||'''Type''' ||'''Default''' ||'''Required''' ||'''Descri= ption''' || ||`column_family` ||`string` ||n/a ||Y ||The name of the CF of the column b= eing looked up. || ||`super_column` ||`binary` ||n/a ||N ||The super column name. || ||`column` ||`binary` ||n/a ||N ||The column name. || =3D=3D=3D ColumnParent =3D=3D=3D The `ColumnParent` is the path to the parent of a particular set of `Column= s`. It is used when selecting groups of columns from the same !ColumnFamily= . In directory structure terms, imagine `ColumnParent` as `ColumnPath + '/.= ./'`. ||'''Attribute''' ||'''Type''' ||'''Default''' ||'''Required''' ||'''Descri= ption''' || ||`column_family` ||`string` ||n/a ||Y ||The name of the CF of the column b= eing looked up. || ||`super_column` ||`binary` ||n/a ||N ||The super column name. || =3D=3D=3D SlicePredicate =3D=3D=3D A `SlicePredicate` is similar to a [[http://en.wikipedia.org/wiki/Predicate= _(mathematical_logic)|mathematic predicate]], which is described as "a prop= erty that the elements of a set have in common." `SlicePredicate`'s in Cassandra are described with either a list of `column= _names` or a `SliceRange`. ||'''Attribute''' ||'''Type''' ||'''Default''' ||'''Required''' ||'''Descri= ption''' || ||`column_names` ||`list` ||n/a ||N ||A list of column names to ret= rieve. This can be used similar to Memcached's "multi-get" feature to fetch= N known column names. For instance, if you know you wish to fetch columns = 'Joe', 'Jack', and 'Jim' you can pass those column names as a list to fetch= all three at once. || ||`slice_range` ||`SliceRange` ||n/a ||N ||A `SliceRange` describing how to= range, order, and/or limit the slice. || If `column_names` is specified, `slice_range` is ignored. =3D=3D=3D SliceRange =3D=3D=3D A `SliceRange` is a structure that stores basic range, ordering and limit i= nformation for a query that will return multiple columns. It could be thoug= ht of as Cassandra's version of `LIMIT` and `ORDER BY`. ||'''Attribute''' ||'''Type''' ||'''Default''' ||'''Required''' ||'''Descri= ption''' || ||`start` ||`binary` ||n/a ||Y ||The column name to start the slice with. T= his attribute is not required, though there is no default value, and can be= safely set to `''`, i.e., an empty byte array, to start with the first col= umn name. Otherwise, it must be a valid value under the rules of the Compa= rator defined for the given `ColumnFamily`. || ||`finish` ||`binary` ||n/a ||Y ||The column name to stop the slice at. Thi= s attribute is not required, though there is no default value, and can be s= afely set to an empty byte array to not stop until `count` results are seen= . Otherwise, it must also be a valid value to the `ColumnFamily` Comparator= . || ||`reversed` ||`bool` ||`false` ||Y ||Whether the results should be ordered= in reversed order. Similar to `ORDER BY blah DESC` in SQL. || ||`count` ||`integer` ||`100` ||Y ||How many columns to return. Similar to = `LIMIT 100` in SQL. May be arbitrarily large, but Thrift will materialize t= he whole result into memory before returning it to the client, so be aware = that you may be better served by iterating through slices by passing the la= st value of one call in as the `start` of the next instead of increasing `c= ount` arbitrarily large. || =3D=3D=3D KeyRange =3D=3D=3D A `KeyRange` is used by `get_range_slices` to define the range of keys to g= et the slices for. The semantics of start keys and tokens are slightly different. Keys are sta= rt-inclusive; tokens are start-exclusive. Token ranges may also wrap -- tha= t is, the end token may be less than the start one. Thus, a range from keyX= to keyX is a one-element range, but a range from tokenY to tokenY is the f= ull ring. ||'''Attribute''' ||'''Type''' ||'''Default''' ||'''Required''' ||'''Descri= ption''' || ||`start_key` ||`string` ||n/a ||N ||The first key in the inclusive `KeyRan= ge`. || ||`end_key` ||`string` ||n/a ||N ||The last key in the inclusive `KeyRange`= . || ||`start_token` ||`string` ||n/a ||N ||The first token in the exclusive `Ke= yRange`. || ||`end_token` ||`string` ||n/a ||N ||The last token in the exclusive `KeyRa= nge`. || ||`count` ||`i32` ||100 ||Y ||The total number of keys to permit in the `Ke= yRange`. || =3D=3D=3D KeySlice =3D=3D=3D A `KeySlice` encapsulates a mapping of a key to the slice of columns for it= as returned by the get_range_slices operation. Normally, when slicing a si= ngle key, a `list` of the slice would be returned. Whe= n slicing multiple or a range of keys, a `list` is instead return= ed so that each slice can be mapped to their key. ||'''Attribute''' ||'''Type''' ||'''Default''' ||'''Required''' ||'''Descri= ption''' || ||`key` ||`string` ||n/a ||Y ||The key for the slice. || ||`columns` ||`list` ||n/a ||Y ||The columns in the sl= ice. || =3D=3D=3D TokenRange =3D=3D=3D A structure representing structural information about the cluster provided = by the `describe` utility methods detailed below. ||'''Attribute''' ||'''Type''' ||'''Default''' ||'''Required''' ||'''Descri= ption''' || ||`start_token` ||`string` ||n/a ||Y ||The first token in the `TokenRange`.= || ||`end_token` ||`string` ||n/a ||Y ||The last token in the `TokenRange`. || ||`endpoints` ||`list` ||n/a ||Y ||A list of the endpoints (nodes) = that replicate data in the `TokenRange`. || =3D=3D=3D Mutation =3D=3D=3D A `Mutation` encapsulates either a column to insert, or a deletion to execu= te for a key. Like `ColumnOrSuperColumn`, the two properties are mutually e= xclusive - you may only set one on a Mutation. ||'''Attribute''' ||'''Type''' ||'''Default''' ||'''Required''' ||'''Descri= ption''' || ||`column_or_supercolumn` ||`ColumnOrSuperColumn` ||n/a ||N ||The column to= insert in to the key. || ||`deletion` ||`Deletion` ||n/a ||N ||The deletion to execute on the key. || =3D=3D=3D Deletion =3D=3D=3D A `Deletion` encapsulates an operation that will delete all columns matchin= g the specified `timestamp` and `predicate`. If `super_column` is specified= , the `Deletion` will operate on columns within the `SuperColumn` - otherwi= se it will operate on columns in the top-level of the key. ||'''Attribute''' ||'''Type''' ||'''Default''' ||'''Required''' ||'''Descri= ption''' || ||`timestamp` ||`i64` ||n/a ||Y ||The timestamp of the column(s) to be dele= ted. || ||`super_column` ||`binary` ||n/a ||N ||The super column to delete the colu= mn(s) from. || ||`predicate` ||`SlicePredicate` ||n/a ||N ||A predicate to match the colum= n(s) to be deleted from the key/super column. || =3D=3D=3D AuthenticationRequest =3D=3D=3D A structure that encapsulates a request for the connection to be authentica= ted. The authentication credentials are arbitrary - this structure simply p= rovides a mapping of credential name to credential value. ||'''Attribute''' ||'''Type''' ||'''Default''' ||'''Required''' ||'''Descri= ption''' || ||`credentials` ||`map` ||n/a ||Y ||A map of named credenti= als. || =3D=3D Method calls =3D=3D =3D=3D=3D login =3D=3D=3D . `void login(keyspace, auth_request)` Authenticates with the cluster for operations on the specified keyspace usi= ng the specified `AuthenticationRequest` credentials. Throws `Authenticatio= nException` if the credentials are invalid or `AuthorizationException` if t= he credentials are valid, but not for the specified keyspace. =3D=3D=3D get =3D=3D=3D . `ColumnOrSuperColumn get(keyspace, key, column_path, consistency_level)` Get the `Column` or `SuperColumn` at the given `column_path`. If no value = is present, `NotFoundException` is thrown. (This is the only method that c= an throw an exception under non-failure conditions.) =3D=3D=3D get_slice =3D=3D=3D . `list get_slice(keyspace, key, column_parent, predi= cate, consistency_level)` Get the group of columns contained by `column_parent` (either a `ColumnFami= ly` name or a `ColumnFamily/SuperColumn` name pair) specified by the given = `SlicePredicate` struct. =3D=3D=3D multiget_slice =3D=3D=3D . `map> multiget_slice(keyspace, keys, co= lumn_parent, predicate, consistency_level)` Retrieves slices for `column_parent` and `predicate` on each of the given k= eys in parallel. Keys are a `list of the keys to get slices for. This is similar to `get_range_slice` (Cassandra 0.5) except operating on a = set of non-contiguous keys instead of a range of keys. =3D=3D=3D get_count =3D=3D=3D . `i32 get_count(keyspace, key, column_parent, consistency_level)` Counts the columns present in `column_parent`. The method is not O(1). It takes all the columns from disk to calculate the= answer. The only benefit of the method is that you do not need to pull all= the columns over Thrift interface to count them. =3D=3D=3D get_range_slices =3D=3D=3D . `list get_range_slices(keyspace, column_parent, predicate, ran= ge, consistency_level)` Replaces `get_range_slice`. Returns a list of slices for the keys within th= e specified `KeyRange`. Unlike get_key_range, this applies the given predic= ate to all keys in the range, not just those with undeleted matching data. = This method is only allowed when using an order-preserving partitioner. =3D=3D=3D insert =3D=3D=3D . `insert(keyspace, key, column_path, value, timestamp, consistency_level)` Insert a `Column` consisting of (`column_path.column`, `value`, `timestamp`= ) at the given `column_path.column_family` and optional `column_path.super_= column`. Note that `column_path.column` is here required, since a !SuperCo= lumn cannot directly contain binary values -- it can only contain sub-Colum= ns. =3D=3D=3D batch_mutate =3D=3D=3D . `batch_mutate(keyspace, mutation_map, consistency_level)` Executes the specified mutations on the keyspace. `mutation_map` is a `map<= string, map>>`; the outer map maps the key to the in= ner map, which maps the column family to the `Mutation`; can be read as: `m= ap>>`. To be more= specific, the outer map key is a row key, the inner map key is the column = family name. A `Mutation` specifies either columns to insert or columns to delete. See `= Mutation` and `Deletion` above for more details. =3D=3D=3D remove =3D=3D=3D . `remove(keyspace, key, column_path, timestamp, consistency_level)` Remove data from the row specified by `key` at the granularity specified by= `column_path`, and the given `timestamp`. Note that all the values in `co= lumn_path` besides `column_path.column_family` are truly optional: you can = remove the entire row by just specifying the !ColumnFamily, or you can remo= ve a !SuperColumn or a single Column by specifying those levels too. Note t= hat the `timestamp` is needed, so that if the commands are replayed in a di= fferent order on different nodes, the same result is produced. =3D=3D=3D describe_keyspaces =3D=3D=3D . `set describe_keyspaces()` Gets a list of all the keyspaces configured for the cluster. =3D=3D=3D describe_cluster_name =3D=3D=3D . `string describe_cluster_name()` Gets the name of the cluster. =3D=3D=3D describe_version =3D=3D=3D . `string describe_version()` Gets the Thrift API version. =3D=3D=3D describe_ring =3D=3D=3D . `list describe_ring(keyspace)` Gets the token ring; a map of ranges to host addresses. Represented as a se= t of `TokenRange` instead of a map from range to list of endpoints, because= you can't use Thrift structs as map keys: https://issues.apache.org/jira/b= rowse/THRIFT-162 for the same reason, we can't return a set here, even thou= gh order is neither important nor predictable. =3D=3D=3D describe_keyspace =3D=3D=3D . `map> describe_keyspace(keyspace)` Gets information about the specified keyspace. =3D=3D Examples =3D=3D [[http://wiki.apache.org/cassandra/ClientExamples|There are a few examples = on this page over here.]]