Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of arodrime@gmail.com designates
 209.85.217.180 as permitted sender)
MIME-Version: 1.0
From: Alain RODRIGUEZ <arodrime@gmail.com>
Date: Thu, 18 Apr 2013 16:41:37 +0200
Message-ID: 
 <CA+VSrLovBjjzhrtQhgXxTHA5kELfpWnAFHbKFEmeLU-HGUm=1A@mail.gmail.com>
Subject: Ec2Snitch to Ec2MultiRegionSnitch
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=089e0112c0f4bd515f04daa39ac4

--089e0112c0f4bd515f04daa39ac4
Content-Type: text/plain; charset=ISO-8859-1

Hi,

The company I work for is having so much success that we are expanding
worldwide :). We have to deploy our Cassandra servers worldwide too in
order to improve the latency of our new abroad customers.

I am wondering about the process to grow from one data center to a few of
them. First thing is we use EC2Snitch for now. So I guess we have to switch
to Ec2MultiRegionSnitch.

Is that doable without any down-time ?

Our C* cluster : C*1.2.2, 6 EC2 m1.xLarge in eu-west already running,
wanting to add 3 m1.xLarge on us-east

I was planning to do it this way:

1/ Change the yaml conf on each of the 6 eu-west existing nodes
    - Ec2Snitch to Ec2MultiRegionSnitch
    - uncomment the broadcast_address and set the public ip of the node
    - let the private ip as defined right now the listen_address
    - switch seeds from private to public IP
2/ Rolling restart
    - nodetool disablegossip
    - nodetool disablethrift
    - nodetool drain
    - rm /path/cassandra/commitlog/* ? (I used to do it since drain was
broken to avoid replaying counters logs, leading to overcounts, not sure
how pertinent this is nowadays)
    - service cassandra stop
    - service cassandra start
3/
    - Make sure everything is still running smoothly in eu-west servers
4/
    - Add 3 nodes one by one with auto_bootstrap set to true.
5/
    - Repair nodes (one by one)
    - Cleanup nodes (one by one)


Questions :

a/ Do I have to move the tokens since I don't use vnodes yet ? How should I
position all these nodes ?
b/ Is it useful to add a seed from the new us-east data center in the yaml
of all nodes ?
c/ I am using the SimpleStrategy. Is it worth it/mandatory to change this
strategy when using multiple DC ?
d/ With my 2 DC will I have 3 RF per DC or cross DC ?
e/ Should I configure my C* client to use the C* nodes from their region as
coordinators  (which seems to me the good way) or should I configure all
the servers everywhere ?

Any comment on the process described above would be appreciated, specially
if you are arguing that something is wrong about it.

If you find out I am missing something, I will be glad to hear about it.

Alain

--089e0112c0f4bd515f04daa39ac4
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Hi,</div><div><br></div><div>The company I work for i=
s having so much success that we are expanding worldwide :). We have to dep=
loy our Cassandra servers worldwide too in order to improve the latency of =
our new abroad customers.</div>

<div><br></div><div>I am wondering about the process to grow from one data =
center to a few of them. First thing is we use EC2Snitch for now. So I gues=
s we have to switch to Ec2MultiRegionSnitch.</div><div><br></div><div>
Is that doable without any down-time ?=A0</div>
<div><br></div><div>Our C* cluster : C*1.2.2, 6 EC2 m1.xLarge in eu-west al=
ready running, wanting to add 3 m1.xLarge on us-east</div><div><br></div><d=
iv>I was planning to do it this way:</div><div><br></div><div>1/ Change the=
 yaml conf on each of the 6 eu-west existing nodes</div>

<div>=A0 =A0 - Ec2Snitch to Ec2MultiRegionSnitch</div><div>=A0 =A0 - uncomm=
ent the broadcast_address and set the public ip of the node</div><div>=A0 =
=A0 - let the private ip as defined right now the listen_address</div><div>=
=A0 =A0 - switch seeds from private to public IP</div>

<div>2/ Rolling restart</div><div>=A0 =A0 - nodetool disablegossip</div><di=
v>=A0 =A0 - nodetool disablethrift</div><div>=A0 =A0 - nodetool drain</div>=
<div>=A0 =A0 - rm /path/cassandra/commitlog/* ? (I used to do it since drai=
n was broken to avoid replaying counters logs, leading to overcounts, not s=
ure how pertinent this is nowadays)</div>

<div>=A0 =A0 - service cassandra stop</div><div>=A0 =A0 - service cassandra=
 start</div><div>3/</div><div>=A0 =A0 - Make sure everything is still runni=
ng smoothly in eu-west servers</div><div>4/</div><div>=A0 =A0 - Add 3 nodes=
 one by one with auto_bootstrap set to true.</div>

<div>5/</div><div>=A0 =A0 - Repair nodes (one by one)</div><div>=A0 =A0 - C=
leanup nodes (one by one)</div><div><br></div><div><br></div><div>Questions=
 :</div><div><br></div><div>a/ Do I have to move the tokens since I don&#39=
;t use vnodes yet ? How should I position all these nodes ?</div>

<div>b/ Is it useful to add a seed from the new us-east data center in the =
yaml of all nodes ?</div><div>c/ I am using the SimpleStrategy. Is it worth=
 it/mandatory to change this strategy when using multiple DC ?</div><div>

d/ With my 2 DC will I have 3 RF per DC or cross DC ?</div><div>e/ Should I=
 configure my C* client to use the C* nodes from their region as coordinato=
rs =A0(which seems to me the good way) or should I configure all the server=
s everywhere ?</div>

<div><br></div><div>Any comment on the process described above would be app=
reciated, specially if you are arguing that something is wrong about it.</d=
iv><div><br></div><div>If you find out I am missing something, I will be gl=
ad to hear about it.</div>

<div><br></div><div>Alain</div><div style><br></div></div>

--089e0112c0f4bd515f04daa39ac4--