cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ian douglas <>
Subject Re: Working backwards from production to staging/dev
Date Fri, 25 Mar 2011 16:59:20 GMT
Thanks, Jonathan!

Part of what we're trying to accomplish is a data cleanup. One of our 
nodes seems to have some lingering data from an old column family that 
we no longer have defined (we're running v0.60) so that node has a few 
GB of data that never gets replicated. We're hoping that by bringing 
that node offline, that we could flush out that old data so our nodes 
appear more balanced in disk load.

We're also considering just moving our three 'medium' (32-bit) EC2 
instances to a single extra-large (64-bit) instance to do what you've 
suggested, but that would mean moving from a 32-bit platform to a 64-bit 
platform. Is Cassandra 0.60 going to have problems if we migrate data to 
a single 64-bit system and then back to several 32-bit systems? (we've 
looked at replicating our PostgreSQL database, but the binary data files 
are not compatible between 32-bit and 64-bit systems)


On 03/18/2011 11:56 AM, Jonathan Ellis wrote:
> That should work, but if you have the disk space it's a lot simpler to
> just copy all the data files from each machine to a target out of the
> cluster, then have the target run cleanup.
> On Fri, Mar 18, 2011 at 1:07 PM, ian douglas<>  wrote:
>> Hi everyone,
>> I was on the mailing list back in December/January, asking questions about
>> rebalancing some nodes, etc. We currently have a ring of 3 systems,
>> redundancy set to 2, and all is well.
>> We'd like to snapshot our ring and build a new development/staging node from
>> it (the old dev node is quite stale), and we're curious what the "best
>> practice" is for something like that.
>> We're thinking we might replicate our 3 nodes as 3 more new nodes, but on a
>> whole new ring, then remove one node, issue flush/cleanup commands on the
>> remaining two (with redundancy set to '2', we should only need to remove one
>> node, to have all data on both remaining nodes, right?), then tarball the
>> Cassandra data path from one machine, and download it to a local development
>> environment.
>> As long as we're using the same version of Cassandra, is there any drawback
>> to this approach?
>> Thanks,
>> Ian

View raw message