incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: On 0.6.6 to 0.7.3 migration, DC-aware traffic and minimising data transfer
Date Fri, 18 Mar 2011 20:24:40 GMT
Right.

Only subtlety is the system keyspace; cleanest is to just start from
scratch there (which means rebuilding the schema) but you could also
start with a copy of an existing node's (just one) and start up with
-Dcassandra.load_ring_state=false.

On Fri, Mar 18, 2011 at 2:29 PM, Jeremiah Jordan
<JEREMIAH.JORDAN@morningstar.com> wrote:
> So can one just take all of the *.db files from all the machines in a cluster, put them
in a folder together (renaming ones with the same number?) and start up a node which will
then have access to all the data?
>
> -----Original Message-----
> From: Jonathan Ellis [mailto:jbellis@gmail.com]
> Sent: Wednesday, March 16, 2011 1:59 PM
> To: user@cassandra.apache.org
> Cc: Jedd Rashbrooke
> Subject: Re: On 0.6.6 to 0.7.3 migration, DC-aware traffic and minimising data transfer
>
> That should work then, assuming SimpleStrategy/RackUnawareStrategy.
> Otherwise figuring out which machines share which data gets
> complicated.
>
> Note that if you have room on the machines, it's going to be faster to
> copy the entire data set to each machine and run cleanup, than to have
> repair fix 3 of 4 replicas from scratch.  Repair would work,
> eventually, but it's kind of a worst-case scenario for it.
>
> On Mon, Mar 14, 2011 at 10:39 AM, Jedd Rashbrooke <jedd@visualdna.com> wrote:
>>  Jonathon, thank you for your answers here.
>>
>>  To explain this bit ...
>>
>> On 11 March 2011 20:46, Jonathan Ellis <jbellis@gmail.com> wrote:
>>> On Thu, Mar 10, 2011 at 6:06 AM, Jedd Rashbrooke <jedd@visualdna.com> wrote:
>>>>  Copying a cluster between AWS DC's:
>>>>  We have ~ 150-250GB per node, with a Replication Factor of 4.
>>>>  I ack that 0.6 -> 0.7 is necessarily STW, so in an attempt to
>>>>  minimise that outage period I was wondering if it's possible to
>>>>  drain & stop the cluster, then copy over only the 1st, 5th, 9th,
>>>>  and 13th nodes' worth of data (which should be a full copy of
>>>>  all our actual data - we are nicely partitioned, despite the
>>>>  disparity in GB per node) and have Cassandra re-populate the
>>>>  new destination 16 nodes from those four data sets.  If this is
>>>>  feasible, is it likely to be more expensive (in terms of time the
>>>>  new cluster is unresponsive as it rebuilds) than just copying
>>>>  across all 16 sets of data - about 2.7TB.
>>>
>>> I'm confused.  You're trying to upgrade and add a DC at the same time?
>>
>>  Yeah, I know, it's probably not the sanest route - but the hardware
>>  (virtualised, Amazonish EC2 that it is) will be the same between
>>  the two sites, so that reduces some of the usual roll in / roll out
>>  migration risk.
>>
>>  But more importantly for us it would mean we'd have just the
>>  one major outage, rather than two (relocation and 0.6 -> 0.7)
>>
>>  cheers,
>>  Jedd.
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Mime
View raw message