cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Cassandra Wiki] Trivial Update of "LiveSchemaUpdates" by gdusbabek
Date Tue, 06 Apr 2010 22:59:01 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "LiveSchemaUpdates" page has been changed by gdusbabek.
The comment on this change is: headings.
http://wiki.apache.org/cassandra/LiveSchemaUpdates?action=diff&rev1=1&rev2=2

--------------------------------------------------

- ==Work In Progress. Refrain from modifying until I remove this.==
+ = Work In Progress. Refrain from modifying until I remove this. =
  
- Modifying Schema on a Live Cluster
+ = Modifying Schema on a Live Cluster =
  
- Client Operations:
+ == Client Operations ==
- Column family operations:  add, drop, rename
+ Column family operations: add, drop, rename.
+ 
  Keyspace operations: add, drop, rename.  
+ 
  These are all executed via the Thrift interface.  It is expected that you have ALL access
if you are using security.
  
- How it works
+ === How it works ===
- A new system table called 'definitions' keeps track of two things: keyspace definitions
(SCHEMA_CF) and keyspace changes (MIGRATIONS_CF).  TimeUUIDs are used throughout to match
migrations up with schema and vice-versa.
+ A new system table called `definitions` keeps track of two things: keyspace definitions
(`SCHEMA_CF`) and keyspace changes (MIGRATIONS_CF).  TimeUUIDs are used throughout to match
migrations up with schema and vice-versa.
  
- Keyspace Definitions (SCHEMA_CF)
+ === Keyspace Definitions (SCHEMA_CF) ===
  All current keyspace definitions are stored in a single row, one keyspace definition per
column with a TimeUUID as the row key (also servers as version identifier), keyspace name
as column name, and definition serialization as the column value.  There exists a special
row, keyed by "Last Migration" that contains a single column indicating the current schema
version UUID.  This makes it easy to look up the version and then retrieve it.
  
- Migrations (MIGRATIONS_CF)
+ === Migrations (MIGRATIONS_CF) ===
  MIGRATIONS_CF tracks the individual modifications that are made to the schema.  It consists
of a single row keyed by "Migrations Key" with one column per migration.  Each column has
the migration version UUID as its name, with the serialized migration as its value.
  
- Updating
+ == Updating ==
  Applying a migration consists of the following steps:
  1. Generate the migration, which includes a new version UUID.
  2. Update SCHEMA_CF with a new schema row.
@@ -27, +29 @@

  5. Flush the definitions table.
  6. Update runtime data structures (create directories, etc.)
  
- Starting up
+ == Starting Up ==
  When a node starts up, it checks SCHEMA_CF to find out the latest schema version it has.
 If it finds nothing (as would happen with a new cluster), it loads nothing and logs a warning.
 Otherwise, it uses the uuid it just read in to locate the right row in SCHEMA_CF and loads
it.  That row is deserialized into one or more keyspace definitions which are then loaded
in a manner similar to the load-from-xml approach used in the past.
  
  At the same time, the node incorporates its schema version into the gossip digests it sends
to other nodes.  It may be the case that this node does not have the latest schema definitions
(as a result of network partition, bootstrapping a new node, or any other reason you can think
of).  When a version mismatch is detected the definition promulgation mechanism described
next is invoked.
  
- Definition Promulgation
+ == Definition Promulgation ==
  Definition promulgation consists of two phases: 'announce' and 'push'. 'announce' is a way
for node A to declare to node B 'this is the schema version I have'.  If the versions are
equal, the message is ignored.  If A is newer, B responds with an 'announce' to A (this functions
as a request for updates).  If A is older, B responds with an 'push' containing all the migrations
from B that A doesn't have.  
  
  When a schema update originates from the client (Thrift), gossip promulgation is bypassed
and this announce-announce-push approach to push migrations to other nodes.
  
- Concurrency
+ == Concurrency ==
  It is entirely possible and expected that a node will receive migration pushes from multiple
nodes.  Because of this, all migrations are applied on a single-threaded stage and versions
are checked throughout to make sure that a) no migration is applied twice, and 2) migrations
are not applied out of sync.
  
  Each migration knows the version UUID of the migration that immediately precedes it.  If
a node is asked to apply a migration and its current version UUID does not match the last
version UUID of the migration, the migration is discarded.
  
  One weakness of this model is that it is vulnerable if a new update starts before another
update is promulgated to all live nodes--only one migration can be active within a cluster
at any time.  To this we say: don't be stupid; plan and execute your migrations carefully.
  
- Failure Scenarios
+ == Failure Scenarios ==
  A node can fail during any step of the update process.  Here is an examination of what will
happen if a node fails after each part of the update process (described earlier).
  1. Nothing has been applied. Update fails outright.
  2. Extra data exists in SCHEMA_CF but will be ignored because "Last Migration" was not updated.
@@ -53, +55 @@

  5. Startup will happen normally.
  6. Startup will happen normally. 
  
- Under the Hood
+ == Under the Hood ==
  
  
- Special Cases
+ == Special Cases ==
+ === New Cluster ===
  

Mime
View raw message