incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Working backwards from production to staging/dev
Date Fri, 25 Mar 2011 17:12:21 GMT
On Fri, Mar 25, 2011 at 11:59 AM, ian douglas <ian@armorgames.com> wrote:
> Part of what we're trying to accomplish is a data cleanup. One of our nodes
> seems to have some lingering data from an old column family that we no
> longer have defined (we're running v0.60)

I don't know if you could hear that from where you are, but our whole
office just yelled, "WTF!" :)

Seriously, you need to upgrade to 0.6.12.  Do read NEWS.txt, in
particular this part:

    - We try to keep minor versions 100% compatible (data format,
      commitlog format, network format) within the major series, but
      we introduced a network-level incompatibility in 0.6.1.
      Thus, if you are upgrading from 0.6.0 to any higher version
      (0.6.1, 0.6.2, etc.) then you will need to restart your entire
      cluster with the new version, instead of being able to do a
      rolling restart.

You will want to diff the config files for new settings (definitely a
new required one for cache saving, may be one or two others); other
than that 0.6.12 is 100% api and disk compatible with 0.6.0.

> so that node has a few GB of data
> that never gets replicated. We're hoping that by bringing that node offline,
> that we could flush out that old data so our nodes appear more balanced in
> disk load.

Sure, you just need to r/m the CF data files.  0.6 doesn't do that for
you automatically.

> We're also considering just moving our three 'medium' (32-bit) EC2 instances
> to a single extra-large (64-bit) instance to do what you've suggested, but
> that would mean moving from a 32-bit platform to a 64-bit platform. Is
> Cassandra 0.60 going to have problems if we migrate data to a single 64-bit
> system and then back to several 32-bit systems? (we've looked at replicating
> our PostgreSQL database, but the binary data files are not compatible
> between 32-bit and 64-bit systems)

(a) you should definitely move to 64bit and stay there since that
allows using mmap'd I/O which is a 20%-50% performance boost
(b) you should move to L or XL instances on EC2 since on anything
smaller the I/O is not predictable enough (see
http://wiki.apache.org/cassandra/CassandraHardware for more
recommendations)
(c) data files are compatible b/t 32bit and 64bit

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Mime
View raw message