incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: On 0.6.6 to 0.7.3 migration, DC-aware traffic and minimising data transfer
Date Fri, 11 Mar 2011 20:46:16 GMT
On Thu, Mar 10, 2011 at 6:06 AM, Jedd Rashbrooke <jedd@visualdna.com> wrote:
>  My question is whether it's considered safer to upgrade via 0.6.12
>  to 0.7, or if a direct 0.6.6 -> 0.7 upgrade is safe enough?

You don't need latest 0.6 before upgrading.

>  Copying a cluster between AWS DC's:
>  We have ~ 150-250GB per node, with a Replication Factor of 4.
>  I ack that 0.6 -> 0.7 is necessarily STW, so in an attempt to
>  minimise that outage period I was wondering if it's possible to
>  drain & stop the cluster, then copy over only the 1st, 5th, 9th,
>  and 13th nodes' worth of data (which should be a full copy of
>  all our actual data - we are nicely partitioned, despite the
>  disparity in GB per node) and have Cassandra re-populate the
>  new destination 16 nodes from those four data sets.  If this is
>  feasible, is it likely to be more expensive (in terms of time the
>  new cluster is unresponsive as it rebuilds) than just copying
>  across all 16 sets of data - about 2.7TB.

I'm confused.  You're trying to upgrade and add a DC at the same time?

>  Chattiness / gossip traffic requirements on DC-aware:
>  I haven't pondered deeply on a 7 design yet, so this question is
>  even more nebulous.  We're seeing growth (raw) of about 100GB
>  per month on our 16 node RF4 cluster - say about 25GB of 'actual'
>  data growth.  We don't delete (much) data.  Amazon's calculator
>  suggests even 100GB in/out of a data center is modestly priced,
>  but I'm cautious in case the replication traffic is particularly chatty
>  or excessive.  And how expensive (in terms of traffic) a compaction
>  or repair would be across data centers.

Compactions are node-local.

Normal writes are optimized for the WAN (only one copy will be sent
between DCs; the recipient in the other DC will then forward it to
other replicas there).

Repairs is not yet WAN-optimized but is still cheap if your replicas
are close to consistent since only merkle trees + inconsistent ranges
are sent over the network.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Mime
View raw message