incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ian douglas <...@armorgames.com>
Subject Re: Working backwards from production to staging/dev
Date Thu, 31 Mar 2011 16:15:19 GMT
Thanks Edward,

Anyone able to provide some answers for the other questions?


On 03/26/2011 07:25 AM, Edward Capriolo wrote:
> On Fri, Mar 25, 2011 at 2:11 PM, ian douglas<ian@armorgames.com>  wrote:
>> On 03/25/2011 10:12 AM, Jonathan Ellis wrote:
>>> On Fri, Mar 25, 2011 at 11:59 AM, ian douglas<ian@armorgames.com>    wrote:
>>>> (we're running v0.60)
>>> I don't know if you could hear that from where you are, but our whole
>>> office just yelled, "WTF!" :)
>> Ah, that's what that noise was... And yeah, we know we're way behind. Our
>> initial delay in upgrading was waiting for 0.7 to come out and then we
>> learned we needed a whole new Thrift client for our PHP code base, and then
>> we got busy on other things, but we're at a point where we have some time to
>> take care of Cassandra and get it upgraded.
>>
>>   Our planned path, now, is:
>>
>> (our nodes' tokens are numbered using the python code (0, 1/3 and 2/3 times
>> 2^127), and called node 1 through 3, respectively; our RF is set to 2 right
>> now)
>>
>> 1. remove node 1 from our software
>> 2. bring node 1 offline after a flush/repair/cleanup
>> 3. run a cleanup on node 2 and then on node 3 so they have a full copy of
>> all data from the old node 1 and each other.
>> 4. bring up a new Large 64-bit instance, install 0.6.12, assign a Token
>> value of 0 (node 1), RF:2, on a new gossip ring, and copy all data from the
>> 32-bit nodes 2 and 3 and run a repair/cleanup to remove any duplicated data
>> 5. remove node 3 from our software
>> 6. point our code to the new 64-bit node 1
>> 7. bring node 3 offline after a flush/repair/cleanup so node 2 has the last
>> fresh copy of everything
>> 8. bring node 2 offline after a flush/repair/cleanup
>> 9. bring up another Large instance, get a copy of all data from our old node
>> 2, assign a Token value of (1/2 * 2^127), RF:2, on the new gossip ring, run
>> a repair to remove duplicate data, and then a cleanup so it gets replicated
>> data from the new node 1
>> 10. add the new node 2 to our software
>> 11. run a final cleanup on the new node 1 and then on node 2 to make sure
>> all data is replicated evenly on both nodes
>>
>> ... at this point, we should have two 64-bit Large instances, with RF:2, on
>> a new gossip ring, replacing three 32-bit systems, with minimal down time
>> and no data loss (just a data delay between steps 6 and 10 above).
>>
>> Questions:
>> 1. Does it appear that we've missed any steps, or doing something out of
>> order?
>> 2. Is the flush/repair/cleanup overkill when bringing the old nodes offline,
>> or is that the correct sequence to follow?
>> 3. Will the difference in compute units (lower on Large instances than
>> Medium instances) make any noticeable difference, or will the fact that the
>> machine is 64-bit handle things efficiently enough such that a Large
>> instance works harder than a Medium instance? (never did figure out their
>> how their compute units work)
>> 4. Can we follow similar steps when we're ready to upgrade to 0.7x and have
>> our new Thrift client for PHP all squared away?
>>
>>
>> Thanks again for the help!!!
>>
>>
> If you have a node with an old column family you are not using
> anymore...Stop node...delete data...start node.
>
> Edward

Mime
View raw message