Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of post@fantasista.no designates
 213.236.237.140 as permitted sender)
Message-Id: <50fdc0653c883aef1f0796f99c2a6b19909c6024@pop3.fantasista.no>
From: "Vegard  Berget" <post@fantasista.no>
Reply-To: "Vegard  Berget" <post@fantasista.no>
To: user@cassandra.apache.org
Subject: Moving data from one datacenter to another
Date: Wed, 19 Dec 2012 13:27:45 +0100
Content-Type: multipart/alternative;
 boundary="=_8333d2f115ab7deddab366698ecd7249"
MIME-Version: 1.0

--=_8333d2f115ab7deddab366698ecd7249
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi,=0AI know this have been a topic here before, but I need some input o=
n=0Ahow to move data from one datacenter to another (and google just giv=
es=0Ame some old mails)=C2=A0- and at the same time moving "production"=
 writing=0Athe same way. =C2=A0To add the target cluster into the source=
 cluster and=0Ajust replicate data before moving source nodes is not an=
 option, but=0Amy plan is as follows:1) =C2=A0Flush data on source clust=
er and move all=0Adata/-files to the destination cluster. =C2=A0While th=
is is going on, we=0Aare still writing to the source cluster.2) =C2=A0Wh=
en data is copied,=0Astart cassandra on the new cluster - and then move=
 writing/reading to=0Athe new cluster.3) =C2=A0Now, do a new flush on th=
e source cluster. =C2=A0As I=0Aunderstand, the sstable files are immutab=
le, so the _newly added_=0Adata/ files could be moved to the target clus=
ter.4) =C2=A0After new data=0Ais also copied into the the target data/,=
 do a nodetool -refresh to=0Aload the new sstables into the system (i kn=
ow we need to take care of=0Afilenames).=C2=A0=0A=0A=09It's worth noting=
 that none of the data is critical, but it would be=0Anice to get it cor=
rect. =C2=A0I know that there will be a short period=0Abetween 2 and 4 t=
hat reads potentially could read old data (written=0Awhile copying, read=
ing after we have moved read/write). =C2=A0This is ok=0Ain this case.=
 =C2=A0Our second alternative is to:=0A=0A=091) Drain old cluster=0A2) C=
opy to new cluster=0A3) Start new cluster=0A=0A=09This will cause the cl=
uster to be unavailable for writes in the=0Acopy-period, and I wish to a=
void that (even if that, too, is=0Asurvivable).=0A=0A=09Both nodes are 1=
1.6, but it might be that we upgrade the target to=0A1.1.7, as I can't=
 see that this will cause any problems? =C2=A0=C2=A0=0A=0A=09Questions:=
=0A=0A=091) =C2=A0It's the same number of nodes on both clusters, but do=
es the=0Atokens need to be the same aswell? =C2=A0(Wouldn't a repair cor=
rect that=0Alater?)=0A=0A=092) =C2=A0Could data files have any name? =C2=
=A0Could we, to avoid a filename=0Acrash, just substitute the numbers wi=
th for example XXX in the=0Adata-files?=0A=0A=093) =C2=A0Is this really=
 a sane way to do things? =C2=A0=0A=0A=09Suggestions are most welcome!=
=0A=0A=09Regards=0AVegard Berget=0A=0A

--=_8333d2f115ab7deddab366698ecd7249
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<html><body style=3D"font-family: 'Helvetica Neue',Helvetica,Arial,sans-=
serif; font-size: 12px;">Hi,<div><br /></div><div>I know this have been=
 a topic here before, but I need some input on how to move data from one=
 datacenter to another (and google just gives me some old mails)=C2=A0-=
 and at the same time moving "production" writing the same way. =C2=A0</=
div><div>To add the target cluster into the source cluster and just repl=
icate data before moving source nodes is not an option, but my plan is a=
s follows:</div><div>1) =C2=A0Flush data on source cluster and move all=
 data/-files to the destination cluster. =C2=A0While this is going on, w=
e are still writing to the source cluster.</div><div>2) =C2=A0When data=
 is copied, start cassandra on the new cluster - and then move writing/r=
eading to the new cluster.</div><div>3) =C2=A0Now, do a new flush on the=
 source cluster. =C2=A0As I understand, the sstable files are immutable,=
 so the _newly added_ data/ files could be moved to the target cluster.<=
/div><div>4) =C2=A0After new data is also copied into the the target dat=
a/, do a nodetool -refresh to load the new sstables into the system (i k=
now we need to take care of filenames).=C2=A0<br /><p>It's worth noting=
 that none of the data is critical, but it would be nice to get it corre=
ct. =C2=A0I know that there will be a short period between 2 and 4 that=
 reads potentially could read old data (written while copying, reading a=
fter we have moved read/write). =C2=A0This is ok in this case. =C2=A0Our=
 second alternative is to:</p><p>1) Drain old cluster<br />2) Copy to ne=
w cluster<br />3) Start new cluster</p><p>This will cause the cluster to=
 be unavailable for writes in the copy-period, and I wish to avoid that=
 (even if that, too, is survivable).</p><p>Both nodes are 1.1.6, but it=
 might be that we upgrade the target to 1.1.7, as I can't see that this=
 will cause any problems? =C2=A0=C2=A0</p><p>Questions:</p><p>1) =C2=A0I=
t's the same number of nodes on both clusters, but does the tokens need=
 to be the same aswell? =C2=A0(Wouldn't a repair correct that later?)</p=
><p>2) =C2=A0Could data files have any name? =C2=A0Could we, to avoid a=
 filename crash, just substitute the numbers with for example XXX in the=
 data-files?</p><p>3) =C2=A0Is this really a sane way to do things? =C2=
=A0</p><p>Suggestions are most welcome!</p><p>Regards<br />Vegard Berget=
</p><p><br /></p></div></body></html>

--=_8333d2f115ab7deddab366698ecd7249--