Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
MIME-Version: 1.0
In-Reply-To: <m21tagvwyz.fsf@noorul.com>
References: <m260zsw4gu.fsf@noorul.com>
	<CAALK=TM67gKVV_VyO5EzS_713-G-2FxfybZ1ZJGrzOJHUeNLRQ@mail.gmail.com>
	<m21tagvwyz.fsf@noorul.com>
Date: Mon, 21 Dec 2015 13:14:31 +0100
Message-ID: 
 <CAALK=TM38z8z9FFQFJ7iTtsg-XiLvc9spSb3szHVSBa3cOU8HQ@mail.gmail.com>
Subject: Re: What is the ideal way to merge two Cassandra clusters with same
 keyspace into one?
From: George Sigletos <sigletos@textkernel.nl>
To: Noorul Islam K M <noorul@noorul.com>
Cc: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=001a114394266bcded0527676ed4

--001a114394266bcded0527676ed4
Content-Type: text/plain; charset=UTF-8

Roughly half TB of data.

There is a timestamp column in the tables we migrated and we did use that
to achieve incremental updates.

I don't know anything about kairosdb, but I can see from the docs that
there exists a row timestamp column. Could you maybe use that one?

Kind regards,
George

On Mon, Dec 21, 2015 at 12:53 PM, Noorul Islam K M <noorul@noorul.com>
wrote:

> George Sigletos <sigletos@textkernel.nl> writes:
>
> > Hello,
> >
> > We had a similar problem where we needed to migrate data from one cluster
> > to another.
> >
> > We ended up using Spark to accomplish this. It is fast and reliable but
> > some downtime was required after all.
> >
> > We minimized the downtime by doing a first run, and then run incremental
> > updates.
> >
>
> How much data are you talking about?
>
> How did you achieve incremental run? We are using kairosdb and some of
> the other schemas does not have a way to filter based on date.
>
> Thanks and Regards
> Noorul
>
> > Kind regards,
> > George
> >
> >
> >
> > On Mon, Dec 21, 2015 at 10:12 AM, Noorul Islam K M <noorul@noorul.com>
> > wrote:
> >
> >>
> >> Hello all,
> >>
> >> We have two clusters X and Y with same keyspaces but distinct data sets.
> >> We are planning to merge these into single cluster. What would be ideal
> >> steps to achieve this without downtime for applications? We have time
> >> series data stream continuously writing to Cassandra.
> >>
> >> We have ruled out export/import as that will make us loose data during
> >> the time of copy.
> >>
> >> We also ruled out sstableloader as that is not reliable. It fails often
> >> and there is not way to start from where it failed.
> >>
> >> Any suggestions will help.
> >>
> >> Thanks and Regards
> >> Noorul
> >>
>

--001a114394266bcded0527676ed4
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div><div>Roughly half TB of data. <br><br>There is a=
 timestamp column in the tables we migrated and we did use that to achieve =
incremental updates.<br><br></div>I don&#39;t know anything about kairosdb,=
 but I can see from the docs that there exists a row timestamp column. Coul=
d you maybe use that one?<br><br></div>Kind regards,<br></div>George<br></d=
iv><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Mon, Dec 21=
, 2015 at 12:53 PM, Noorul Islam K M <span dir=3D"ltr">&lt;<a href=3D"mailt=
o:noorul@noorul.com" target=3D"_blank">noorul@noorul.com</a>&gt;</span> wro=
te:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-=
left:1px #ccc solid;padding-left:1ex"><span class=3D"">George Sigletos &lt;=
<a href=3D"mailto:sigletos@textkernel.nl">sigletos@textkernel.nl</a>&gt; wr=
ites:<br>
<br>
&gt; Hello,<br>
&gt;<br>
&gt; We had a similar problem where we needed to migrate data from one clus=
ter<br>
&gt; to another.<br>
&gt;<br>
&gt; We ended up using Spark to accomplish this. It is fast and reliable bu=
t<br>
&gt; some downtime was required after all.<br>
&gt;<br>
&gt; We minimized the downtime by doing a first run, and then run increment=
al<br>
&gt; updates.<br>
&gt;<br>
<br>
</span>How much data are you talking about?<br>
<br>
How did you achieve incremental run? We are using kairosdb and some of<br>
the other schemas does not have a way to filter based on date.<br>
<br>
Thanks and Regards<br>
<span class=3D"HOEnZb"><font color=3D"#888888">Noorul<br>
</font></span><div class=3D"HOEnZb"><div class=3D"h5"><br>
&gt; Kind regards,<br>
&gt; George<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; On Mon, Dec 21, 2015 at 10:12 AM, Noorul Islam K M &lt;<a href=3D"mail=
to:noorul@noorul.com">noorul@noorul.com</a>&gt;<br>
&gt; wrote:<br>
&gt;<br>
&gt;&gt;<br>
&gt;&gt; Hello all,<br>
&gt;&gt;<br>
&gt;&gt; We have two clusters X and Y with same keyspaces but distinct data=
 sets.<br>
&gt;&gt; We are planning to merge these into single cluster. What would be =
ideal<br>
&gt;&gt; steps to achieve this without downtime for applications? We have t=
ime<br>
&gt;&gt; series data stream continuously writing to Cassandra.<br>
&gt;&gt;<br>
&gt;&gt; We have ruled out export/import as that will make us loose data du=
ring<br>
&gt;&gt; the time of copy.<br>
&gt;&gt;<br>
&gt;&gt; We also ruled out sstableloader as that is not reliable. It fails =
often<br>
&gt;&gt; and there is not way to start from where it failed.<br>
&gt;&gt;<br>
&gt;&gt; Any suggestions will help.<br>
&gt;&gt;<br>
&gt;&gt; Thanks and Regards<br>
&gt;&gt; Noorul<br>
&gt;&gt;<br>
</div></div></blockquote></div><br></div>

--001a114394266bcded0527676ed4--