cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Update of "GettingStarted_draft" by MakiWatanabe
Date Fri, 24 Feb 2012 02:32:06 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "GettingStarted_draft" page has been changed by MakiWatanabe:
http://wiki.apache.org/cassandra/GettingStarted_draft?action=diff&rev1=64&rev2=65

Comment:
Complete rewriting

   
  == Step 1: Download Cassandra Kit ==
   
- Download links for the latest stable release can always be found on the [[http://cassandra.apache.org/download|website]].
+  * Download links for the latest stable release can always be found on the [[http://cassandra.apache.org/download|website]].
-  
- Users of Debian or Debian-based derivatives can install the latest stable release in package
form, see DebianPackaging for details.
+  * Users of Debian or Debian-based derivatives can install the latest stable release in
package form, see DebianPackaging for details.
-  
- Users of RPM-based distributions can get packages from [[http://www.datastax.com/blog/announcing-rpms-cassandra|Datastax]].
+  * Users of RPM-based distributions can get packages from [[http://www.datastax.com/blog/announcing-rpms-cassandra|Datastax]].
-  
- If you are interested in building Cassandra from source, please refer to [[HowToBuild|How
to Build]] page.
+  * If you are interested in building Cassandra from source, please refer to [[HowToBuild|How
to Build]] page.
   
  For more details about misc builds, please refer to [[VersionsAndBuilds|Cassandra versions
and builds]] page.
   
@@ -34, +31 @@

  ## Since there isn't currently an installation method per se, the easiest solution is to
simply run Cassandra from an extracted archive or Git checkout (see: [[#picking_a_version|Picking
a version]]). Also, unless you've downloaded a binary distribution, you'll need to compile
the software by invoking `ant` from the top-level directory.
   
  === Step 2.1: Edit cassandra.yaml ===
- The distribution's sample configuration `conf/cassandra.yaml` contains reasonable defaults
for single node operation, but you will need to make sure that the paths exist for `data_file_directories`,
`commitlog_directory`, and `saved_caches_directory`.
+ The distribution's sample configuration `conf/cassandra.yaml` contains reasonable defaults
for single node operation, but you will need to make sure that the paths exist for '''data_file_directories''',
'''commitlog_directory''', and '''saved_caches_directory'''.
  
- Verify `storage_port` and `rpc_port` are not conflict with other service on your computer.
+ Verify '''storage_port''' and '''rpc_port''' are not conflict with other service on your
computer.
  By default, Cassandra uses 7000 for storage_port, and 9160 for rpc_port. The `storage_port`
must be identical between Cassandra nodes in a cluster. Cassandra client applications will
use `rpc_port` to connect to Cassandra. 
   
- It will be a good idea to change cluster_name to avoid unnecessary conflict with existing
clusters.
+ It will be a good idea to change '''cluster_name''' to avoid unnecessary conflict with existing
clusters.
  
- initial_token. You can leave it blank, but I recommend set it to 0 if you are configuring
your first node.
+ '''initial_token'''. You can leave it blank, but I recommend you to set it to 0 if you are
configuring your first node.
  
  === Step 2.2: Edit log4j-server.properties ===
  `conf/log4j.properties` contains a path for the log file. Edit the line if you need.
@@ -112, +109 @@

  You can access to the online help with 'help;' command. You need semicolon(;) at end to
complete a command in cli.
  
  {{{
- [default@unknown]  help;
+ [default@unknown] help;
  }}}
  
  First, create a keyspace for your test.
@@ -186, +183 @@

  Please note that we didn't use "utf8()" for the row key this time.
  You can define the data type as meta data of the column family. Check 'help update column
family;' and 'help create column family;' for more details.
  
- To be certain though, take some time to try out the examples in CassandraCli before moving
on (note: if you are using Cassandra 0.7.0, you'll need to load the demo Keyspaces first using
JMX, see http://wiki.apache.org/cassandra/FAQ#no_keyspaces, or even better follow testing
instructions on the README of the installation folder). Also, if you run into problems, Don't
Panic, calmly proceed to [[#if_something_goes_wrong|If Something Goes Wrong]].
+ To be certain though, take some time to try out the examples in CassandraCli before moving
on
+ Also, if you run into problems, Don't Panic, calmly proceed to [[#if_something_goes_wrong|If
Something Goes Wrong]].
   
- Users of recent Linux distributions and Mac OS X Snow Leopard should be able to start up
Cassandra simply by untarring and invoking `bin/cassandra -f` with root privileges. Snow Leopard
ships with Java 1.6.0 and does not require changing the `JAVA_HOME` environment variable or
adding any directory to your `PATH`. On Linux just make sure you have a working Java JDK package
installed such as the `openjdk-6-jdk` on Ubuntu Lucid Lynx.
+  Users of recent Linux distributions and Mac OS X Snow Leopard should be able to start up
Cassandra simply by untarring and invoking `bin/cassandra -f` with root privileges. Snow Leopard
ships with Java 1.6.0 and does not require changing the `JAVA_HOME` environment variable or
adding any directory to your `PATH`. On Linux just make sure you have a working Java JDK package
installed such as the `openjdk-6-jdk` on Ubuntu Lucid Lynx.
  
- == Step 3: Running a cluster ==
+ == Configuring Multinode Cluster ==
  
- ***This section should be moved to MultinodeCluster page***
+ Now you have single working Cassandra node. It is a Cassandra cluster which has only one
node. By adding more nodes, you can make it a multi node cluster.
  
- Setting up a Cassandra cluster is ''almost'' as simple as repeating [[#running_a_single_node|Step
2]] for each node in your cluster. There are a few minor exceptions though.
+ Setting up a Cassandra cluster is ''almost'' as simple as repeating the above procedures
 for each node in your cluster. There are a few minor exceptions though.
   
- Cassandra nodes exchange information about one another using a mechanism called Gossip,
but to get the ball rolling a newly started node needs to know of at least one other, this
is called a `Seed`. It's customary to pick a small number of relatively stable nodes to serve
as your seeds, but there is no hard-and-fast rule here. Do make sure that each seed also knows
of at least one other, remember, the goal is to avoid a chicken-and-egg scenario and provide
an avenue for all nodes in the cluster to discover one another.
+ Cassandra nodes exchange information about one another using a mechanism called Gossip,
but to get the ball rolling a newly started node needs to know of at least one other, this
is called a '''Seed'''. It's customary to pick a small number of relatively stable nodes to
serve as your seeds, but there is no hard-and-fast rule here. Do make sure that each seed
also knows of at least one other, remember, the goal is to avoid a chicken-and-egg scenario
and provide an avenue for all nodes in the cluster to discover one another.
   
- In addition to seeds, you'll also need to configure the IP interface to listen on for Gossip
and Thrift, (`ListenAddress` and `ThriftAddress` respectively). Use a `ListenAddress` that
will be reachable from the `ListenAddress` used on all other nodes, and a `ThriftAddress`
that will be accessible to clients.
+ In addition to seeds, you'll also need to configure the IP interface to listen on for Gossip
and Thrift, ('''listen_address''' and '''rpc_address''' respectively). Use a 'listen_address`
that will be reachable from the `listen_address` used on all other nodes, and a `rpc_address`
that will be accessible to clients.
   
+ One other thing you need to care at multi node cluster is '''Token'''. Each node in the
cluster owns a part of token range  from 0 to 2^127-1. 
+ If the Nth node in the cluster has token value T(N), the node owns range from T(N-1)+1 to
T(N).  Cassandra decide nodes where a data should be stored based on the consistent mapping
of the row key and token range (refer to RandomPartitioner, ByteOrderedPartitioner). 
+ 
+ The token can be assigned to node by '''initial_token''' parameter in cassandra.yaml. The
parameter is effective only at the first boot of the node. Once you boot a node, use 'nodetool
move' command to change the assigned token.  You need to specify appropriate initial_token
for each node to balance data load across the nodes.  Here is a python script to calculate
balanced tokens.
+ {{{
+ # Number of nodes in the cluster
+ num_node = 4
+ 
+ for n in range(num_node):
+     print int(2**127 / num_node * n)
+ }}}
+ 
- Once everything is configured and the nodes are running, use the `bin/nodetool` utility
to verify a properly connected cluster. For example:
+ Once everything is configured and the nodes are running, use the `bin/nodetool ring` utility
to verify a properly connected cluster. For example:
   
  {{{
- eevans@achilles:‾$ bin/nodetool -host 98.139.220.175 ring
+ eevans@achilles:‾$ bin/nodetool -host 98.139.220.175 -p 7199 ring
  Address       Status     Load          Range                                      Ring
                                         169048975998562660269742699624378098572
  98.139.220.175  Up         0.02 GB     14183696824377310051808173385764689249     |<--|
@@ -213, +223 @@

  Advanced cluster management is described in [[Operations]].
   
  If you don't yet have access to hardware for a Cassandra cluster you can try it out on EC2
with CloudConfig.
+ 
+ For more details about configuring multi node cluster, please refer to [[MultinodeCluster]].
   
- == Step 4: Write your application ==
+ == Write your application ==
  The recommended way to communicate with Cassandra in your application is to use a [[http://wiki.apache.org/cassandra/ClientOptions|higher-level
client]]. These provide programming language specific API:s for talking to Cassandra in a
variety of languages. The details will vary depending on programming language and client,
but in general using a higher-level client will mean that you have to write less code and
get several features for free that you would otherwise have to write yourself.
   
  That said, it is useful to know that Cassandra uses [[http://thrift.apache.org/|Thrift]]
for its external client-facing API. Cassandra's main API/RPC/Thrift port is 9160. Thrift supports
a [[http://svn.apache.org/viewvc/thrift/trunk/lib/|wide variety of languages]] so you can
code your application to use Thrift directly if you so chose (but again we recommend a [[http://wiki.apache.org/cassandra/ClientOptions|high-level
client]] where available).
   
  Important note: If you intend to use thrift directly, you need to install a version of thrift
that matches the revision that your version of Cassandra uses. InstallThrift
   
- Cassandra's main API/RPC/Thrift port is 9160. It is a common mistake for API clients to
connect to the JMX port instead.
+ Cassandra's main API/RPC/Thrift port is 9160 by default, which is defined as rpc_port in
cassandra.yaml. It is a common mistake for API clients to connect to the JMX port instead.
   
  Checking out a demo application like [[http://github.com/twissandra/twissandra|Twissandra]]
(Python + Django) will also be useful.
   

Mime
View raw message