cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Update of "FAQ" by Nick Pavlica
Date Sat, 01 May 2010 17:35:38 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "FAQ" page has been changed by Nick Pavlica.
http://wiki.apache.org/cassandra/FAQ?action=diff&rev1=60&rev2=61

--------------------------------------------------

  = Frequently asked questions =
- 
   * [[#cant_listen_on_ip_any|Why can't I make Cassandra listen on 0.0.0.0 (all my addresses)?]]
   * [[#ports|What ports does Cassandra use?]]
   * [[#slows_down_after_lotso_inserts|Why does Cassandra slow down after doing a lot of inserts?]]
@@ -23, +22 @@

   * [[#gui|Is there a GUI admin tool for Cassandra?]]
  
  <<Anchor(cant_listen_on_ip_any)>>
+ 
  == Why can't I make Cassandra listen on 0.0.0.0 (all my addresses)? ==
- 
  Cassandra is a gossip-based distributed system.  !ListenAddress is also "contact me here
address," i.e., the address it tells other nodes to reach it at.  Telling other nodes "contact
me on any of my addresses" is a bad idea; if different nodes in the cluster pick different
addresses for you, Bad Things happen.
  
  If you don't want to manually specify an IP to !ListenAddress for each node in your cluster
(understandable!), leave it blank and Cassandra will use !InetAddress.getLocalHost() to pick
an address.  Then it's up to you or your ops team to make things resolve correctly (/etc/hosts/,
dns, etc).
@@ -34, +33 @@

  See [[https://issues.apache.org/jira/browse/CASSANDRA-256|CASSANDRA-256]] and [[https://issues.apache.org/jira/browse/CASSANDRA-43|CASSANDRA-43]]
for more gory details.
  
  <<Anchor(ports)>>
+ 
  == What ports does Cassandra use? ==
  By default, Cassandra uses 7000 for cluster communication, 9160 for clients (Thrift), and
8080 for [[JmxInterface|JMX]].  These are all editable in the configuration file or bin/cassandra.in.sh
(for JVM options). All ports are TCP. See also RunningCassandra.
  
  <<Anchor(slows_down_after_lotso_inserts)>>
+ 
  == Why does Cassandra slow down after doing a lot of inserts? ==
- 
  This is a symptom of memory pressure, resulting in a storm of GC operations as the JVM frantically
tries to free enough heap to continue to operate.  Eventually, the server will crash from
OutOfMemory; usually, but not always, it will be able to log this final error before the JVM
terminates.
  
  You can increase the amount of memory the JVM uses, or decrease the insert threshold before
Cassandra flushes its memtables.  See MemtableThresholds for details.
@@ -49, +49 @@

  Writing with !ConsistencyLevel.ZERO is also an excellent way to run out of memory, since
the server buffers up the write locally leaving the client free to send more.
  
  <<Anchor(existing_data_when_adding_new_nodes)>>
+ 
  == What happens to existing data in my cluster when I add new nodes? ==
- 
  Starting a new node with the -b [bootstrap] option will cause it to contact other nodes
in the cluster to copy the right data to itself.
  
  In Cassandra 0.5 and above, there is an "AutoBootStrap" option in the config file. When
enabled, using the "-b" options is unnecessary, because new nodes will automatically bootstrap
themselves when they start up for the first time. It is recommended that you leave "InitialToken"
blank for these versions, because the improved bootstrap process will pick a balanced Token
for each node.
@@ -58, +58 @@

  In Cassandra 0.4 and below, it is recommended that you manually specify a value for "InitialToken"
in the config file of a new node.
  
  <<Anchor(modify_cf_config)>>
+ 
  == Can I add/remove/rename Column Families on a working cluster? ==
- 
  Yes, but it's important that you do it correctly.
  
   1. Restart and wait for the log replay to finish.
@@ -71, +71 @@

  ''see also: [[https://issues.apache.org/jira/browse/CASSANDRA-44|CASSANDRA-44]]''
  
  <<Anchor(node_clients_connect_to)>>
+ 
  == Does it matter which node a Thrift client connects to? ==
- 
  No, any node in the cluster will work; Cassandra nodes proxy your request as needed. This
leaves you with a number of options for end point selection:
  
   1. You can maintain a list of contact nodes (all or a subset of the nodes in the cluster),
and configure your clients to choose among them.
@@ -81, +81 @@

   1. Deploy a load-balancer, proxy, etc.
  
  <<Anchor(what_kind_of_hardware_should_i_use)>>
+ 
  == What kind of hardware should I run Cassandra on? ==
- 
  See [CassandraHardware].
  
  <<Anchor(architecture)>>
+ 
  == What are SSTables and Memtables? ==
- See [[MemtableSSTable]] and [[MemtableThresholds]].
+ See [[MemtableSSTable]] and MemtableThresholds.
  
  <<Anchor(working_with_timeuuid_in_java)>>
+ 
  == Why is it so hard to work with TimeUUIDType in Java? ==
- 
  TimeUUID's are difficult to use from java clients because java.util.UUID does not support
generating version 1 (time-based) UUIDs.  Here is one way to work with them and Cassandra:
  
  Use the UUID generator from: http://johannburkard.de/software/uuid/
  
- Below are three methods that are quite useful in working with the uuids as they come in
and
+ Below are three methods that are quite useful in working with the uuids as they come in
and out of Cassandra.
- out of Cassandra.
  
  Generate a new UUID to use in a TimeUUIDType sorted column family.
  
  {{{
- 	/**
+         /**
- 	 * Gets a new time uuid.
+          * Gets a new time uuid.
- 	 *
+          *
- 	 * @return the time uuid
+          * @return the time uuid
- 	 */
+          */
- 	public static java.util.UUID getTimeUUID()
+         public static java.util.UUID getTimeUUID()
- 	{
+         {
- 		return java.util.UUID.fromString(new com.eaio.uuid.UUID().toString());
+                 return java.util.UUID.fromString(new com.eaio.uuid.UUID().toString());
- 	}
+         }
- }}}	
+ }}}
+ When you read out of cassandra your getting a byte[] that needs to be converted into a TimeUUID
and since the java.util.UUID doesn't seem to have a simple way of doing this, pass it through
the eaio uuid dealio again.
- 	
- When you read out of cassandra your getting a byte[] that needs to be converted into a TimeUUID
- and since the java.util.UUID doesn't seem to have a simple way of doing this, pass it through
- the eaio uuid dealio again.
  
  {{{
- 	/**
+         /**
- 	 * Returns an instance of uuid.
+          * Returns an instance of uuid.
- 	 * 
+          *
- 	 * @param uuid the uuid
+          * @param uuid the uuid
- 	 * @return the java.util. uuid
+          * @return the java.util. uuid
- 	 */
+          */
- 	public static java.util.UUID toUUID( byte[] uuid )
+         public static java.util.UUID toUUID( byte[] uuid )
- 	{
+         {
          long msb = 0;
          long lsb = 0;
          assert uuid.length == 16;
@@ -135, +132 @@

              lsb = (lsb << 8) | (uuid[i] & 0xff);
          long mostSigBits = msb;
          long leastSigBits = lsb;
- 	
+ 
          com.eaio.uuid.UUID u = new com.eaio.uuid.UUID(msb,lsb);
          return java.util.UUID.fromString(u.toString());
- 	}
+         }
  }}}
+ When you want to actually place the UUID into the Column then you'll want to convert it
like this. This method is often used in conjuntion with the getTimeUUID() mentioned above.
- 
- When you want to actually place the UUID into the Column then you'll want to convert it
like this.
- This method is often used in conjuntion with the getTimeUUID() mentioned above.
  
  {{{
- 	/**
+         /**
- 	 * As byte array.
+          * As byte array.
- 	 * 
+          *
- 	 * @param uuid the uuid
+          * @param uuid the uuid
- 	 * 
+          *
- 	 * @return the byte[]
+          * @return the byte[]
- 	 */
+          */
- 	public static byte[] asByteArray(java.util.UUID uuid) 
+         public static byte[] asByteArray(java.util.UUID uuid)
- 	{
+         {
- 	    long msb = uuid.getMostSignificantBits();
+             long msb = uuid.getMostSignificantBits();
- 	    long lsb = uuid.getLeastSignificantBits();
+             long lsb = uuid.getLeastSignificantBits();
- 	    byte[] buffer = new byte[16];
+             byte[] buffer = new byte[16];
  
- 	    for (int i = 0; i < 8; i++) {
+             for (int i = 0; i < 8; i++) {
- 	            buffer[i] = (byte) (msb >>> 8 * (7 - i));
+                     buffer[i] = (byte) (msb >>> 8 * (7 - i));
- 	    }
+             }
- 	    for (int i = 8; i < 16; i++) {
+             for (int i = 8; i < 16; i++) {
- 	            buffer[i] = (byte) (lsb >>> 8 * (7 - i));
+                     buffer[i] = (byte) (lsb >>> 8 * (7 - i));
- 	    }
+             }
  
- 	    return buffer;
+             return buffer;
- 	}
+         }
  }}}
- 
  <<Anchor(i_deleted_what_gives)>>
+ 
  == I delete data from Cassandra, but disk usage stays the same. What gives? ==
  Data you write to Cassandra gets persisted to SSTables. Since SSTables are immutable, the
data can't actually be removed when you perform a delete, instead, a marker (also called a
"tombstone") is written to indicate the value's new status. Never fear though, on the first
compaction that occurs after ''GCGraceSeconds'' (hint: storage-conf.xml) have expired, the
data will be expunged completely and the corresponding disk space recovered. See DistributedDeletes
for more detail.
  
  <<Anchor(reads_slower_writes)>>
+ 
  == Why are reads slower than writes? ==
- Unlike all major relational databases and some NoSQL systems, Cassandra does not use b-trees
and in-place updates on disk.  Instead, it uses a sstable/memtable model like Bigtable's:
writes to each ColumnFamily are grouped together in an in-memory structure before being flushed
(sorted and written to disk).  This means that writes cost no random I/O, compared to a b-tree
system which not only has to seek to the data location to overwrite, but also may have to
seek to read different levels of the index if it outgrows disk cache!  
+ Unlike all major relational databases and some NoSQL systems, Cassandra does not use b-trees
and in-place updates on disk.  Instead, it uses a sstable/memtable model like Bigtable's:
writes to each ColumnFamily are grouped together in an in-memory structure before being flushed
(sorted and written to disk).  This means that writes cost no random I/O, compared to a b-tree
system which not only has to seek to the data location to overwrite, but also may have to
seek to read different levels of the index if it outgrows disk cache!
  
  The downside is that on a read, Cassandra has to (potentially) merge row fragments from
multiple sstables on disk.  We think this is a tradeoff worth making, first because scaling
writes has always been harder than scaling reads, and second because as your data corpus grows
Cassandra's read disadvantage narrows vs b-tree systems that have to do multiple seeks against
a large index.  See MemtableSSTable for more details.
  
  <<Anchor(cloned)>>
+ 
  == Why does nodeprobe ring only show one entry, even though my nodes logged that they see
each other joining the ring? ==
  This happens when you have the same token assigned to each node.  Don't do that.
  
@@ -188, +185 @@

  The easiest fix is to wipe the data and commitlog directories, thus making sure that each
node will generate a random token on the next restart.
  
  <<Anchor(range_ghosts)>>
+ 
  == Why do deleted keys show up during range scans? ==
  Because get_range_slice says, "apply this predicate to the range of rows given," meaning,
if the predicate result is empty, we have to include an empty result for that row key.  It
is perfectly valid to perform such a query returning empty column lists for some or all keys,
even if no deletions have been performed.
  
@@ -195, +193 @@

  
  This is what we used to do with the old get_key_range method, but the performance hit turned
out to be unacceptable.
  
- See [[DistributedDeletes]] for more on how deletes work in Cassandra.
+ See DistributedDeletes for more on how deletes work in Cassandra.
  
  <<Anchor(change_replication)>>
+ 
  == Can I change the ReplicationFactor on a live cluster? ==
  Yes, but it will require restarting and running repair manually to change the replica count
of existing data.
  
@@ -205, +204 @@

   * Restart cassandra on each node in the cluster
  
  If you're reducing the ReplicationFactor:
+ 
   * Run "nodetool cleanup" on the cluster to remove surplus replicated data.
  
  If you're increasing the ReplicationFactor:
+ 
   * Run "nodetool repair" to run an anti-entropy repair on the cluster. This is an intensive
process so may result in adverse cluster performance.
  
  <<Anchor(large_file_and_blob_storage)>>
+ 
  == Can I Store BLOBs in Cassandra? ==
- Currently Cassandra isn't optimized specifically for large file or BLOB storage, however
there are ways to work around this.
+ 
+ Currently Cassandra isn't optimized specifically for large file or BLOB storage.   However,
files of  around  64Mb and smaller can be easily stored in the database without splitting
them into smaller chunks.  This is primarily due to the fact that Cassandra's public API is
based on Thrift, which offers no streaming abilities;  any value written or fetched has to
fit in to memory.  Other non Thrift  interfaces may solve this problem in the future, but
there are currently no plans to change Thrifts behavior.  When planning  applications that
require storing BLOBS, you should also consider these attributes of Cassandra as well:  
+ 
+   * The main limitation on a column and super column size is that all the data for a single
key and column must fit (on disk) on a single machine(node) in the cluster.  Because keys
alone are used to determine the nodes responsible for replicating their data, the amount of
data associated with a single key has this upper bound. This is an inherent limitation of
the distribution model. 	
+ 
+   * When large columns are created and retrieved, that columns data is loaded into RAM which
 can get resource intensive quickly.  Consider, loading  200 rows with columns  that store
10Mb image files each into RAM.  That small result set would consume about 2Gb of RAM.  Clearly
as more and more large columns are loaded,  RAM would start to get consumed quickly.  This
can be worked around, but will take some upfront planning and testing to get a workable solution
for most applications.
+ 
   * Please refer to the notes in the Cassandra limitations section for more information:
[[CassandraLimitations|Cassandra Limitations]]
  
  <<Anchor(jmx_localhost_refused)>>
+ 
  == Nodetool says "Connection refused to host: 127.0.1.1", for any remote host. What gives?
==
- 
  Nodetool relies on JMX, which in turn relies on RMI, which in turn sets up it's own listeners
and connectors as needed on each end of the exchange. Normally all of this happens behind
the scenes transparently, but incorrect name resolution for either the host connecting, or
the one being connected to, can result in crossed wires and confusing exceptions.
  
  If you are not using DNS, then make sure that your `/etc/hosts` files are accurate on both
ends. If that fails try passing the `-Djava.rmi.server.hostname=$IP` option to the JVM at
startup (where `$IP` is the address of the interface you can reach from the remote machine).
  
  <<Anchor(iter_world)>>
+ 
  == How can I iterate over all the rows in a ColumnFamily? ==
- 
  Simple but slow: Use get_range_slices, start with the empty string, and after each call
use the last key read as the start key in the next iteration.
  
  Better: use HadoopSupport.
  
  <<Anchor(no_keyspaces)>>
+ 
  == Why were none of the keyspaces described in cassandra.xml loaded? ==
- Prior to 0.7, cassandra loaded a set of static keyspaces defined in cassandra.xml (previously
storage-conf.xml).  [[https://issues.apache.org/jira/browse/CASSANDRA-44|CASSANDRA-44]] added
the ability to modify schema dynamically on a live cluster.  Part of this change required
that we ignore the schema defined in cassandra.xml.  The upshot is that you need to define
the schema yourself.  There are currently two ways to do this.  First, in 0.7 there is a `loadSchemaFromXML`
method defined in `StorageServiceMBean` that will load the schema defined in storage-conf.xml.
 This is a one-time operation.  A node that has had its schema defined via `loadSchemaFromXML`
will load its schema from the system table on subsequent restarts.  Second, you can modify
the schema on a node using the `system_*` thrift operations (see [[API|API]]).
+ Prior to 0.7, cassandra loaded a set of static keyspaces defined in cassandra.xml (previously
storage-conf.xml).  [[https://issues.apache.org/jira/browse/CASSANDRA-44|CASSANDRA-44]] added
the ability to modify schema dynamically on a live cluster.  Part of this change required
that we ignore the schema defined in cassandra.xml.  The upshot is that you need to define
the schema yourself.  There are currently two ways to do this.  First, in 0.7 there is a `loadSchemaFromXML`
method defined in `StorageServiceMBean` that will load the schema defined in storage-conf.xml.
 This is a one-time operation.  A node that has had its schema defined via `loadSchemaFromXML`
will load its schema from the system table on subsequent restarts.  Second, you can modify
the schema on a node using the `system_*` thrift operations (see [[API]]).
  
  It is recommended that you only perform schema updates on one node and let cassandra propagate
changes to the rest of the cluster.  If you try to perform the same updates simultaneously
on multiple nodes, you run the risk of introducing inconsistent migrations, which will lead
to a confused cluster.
  
  See LiveSchemaUpdates for more information.
  
  <<Anchor(gui)>>
+ 
  == Is there a GUI admin tool for Cassandra? ==
- 
  The closest is [[http://github.com/driftx/chiton|chiton]], a GTK data browser.
  

Mime
View raw message