Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 957EB109B5 for ; Mon, 21 Dec 2015 12:14:52 +0000 (UTC) Received: (qmail 96330 invoked by uid 500); 21 Dec 2015 12:14:49 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 96287 invoked by uid 500); 21 Dec 2015 12:14:49 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 96276 invoked by uid 99); 21 Dec 2015 12:14:49 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Dec 2015 12:14:49 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 3DBFB180455 for ; Mon, 21 Dec 2015 12:14:49 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.9 X-Spam-Level: ** X-Spam-Status: No, score=2.9 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=textkernel.nl Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id wB9prEJq2ROC for ; Mon, 21 Dec 2015 12:14:38 +0000 (UTC) Received: from mail-vk0-f48.google.com (mail-vk0-f48.google.com [209.85.213.48]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 8E47F212B4 for ; Mon, 21 Dec 2015 12:14:38 +0000 (UTC) Received: by mail-vk0-f48.google.com with SMTP id j66so99368385vkg.1 for ; Mon, 21 Dec 2015 04:14:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=textkernel.nl; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=JhFSceJmOSIxK25QTNQg9EBYJsDM1Ooh0/RgyZ35h8k=; b=eoj3xSRduvAU13cCPn9s9rHuxQB+1IBmbnuvt030cOXLtlAKL/mX+9czboYNj7rghR ddaLJXyzN7V0fNF1xqgxy/Aj4EgVU0cXIvDy1VLYf9ejpXuaQlQ2LnRd7HyGobzroBM9 5/wMGrDp3/jm1wt33zMz+1VVgTfYXT8dbxWPY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=JhFSceJmOSIxK25QTNQg9EBYJsDM1Ooh0/RgyZ35h8k=; b=Tynu/ZQSxNIMPyV5cmFgVp8NuoeJkUimJfbW+/xa35IhfxCky6h75M0VnrOr7IcAch 2wTil5TwWsjLRVxwahRYsHTT7ln0Gb3JDOTVAqlL/jTV5YX8I80PhFxDZEOrpHpNSo4I g3w36I4WJ9v42u8QcktKHP7cIp6/SKU89oSzu607R3G9CyAM+PX+LObD8Dt39YlKadHW H+mr3SmUCoQJuCaBm0UVqf9QOt/aeKHsShhlVNPuFfEy+CPL30kmuu2GwxiahtY8rt2T WYBuBLvSUkEmLsEq+h6ZZZobMJRyyu3LKfPks3ELvtlB/1IfhMNUK2hm079TMjMCph8X w31A== X-Gm-Message-State: ALoCoQmLWyummiNOkIxNfqXA+VY1yJiD1qOpcafOcm8gUx9wD6fZD+5ucGryrG4Dfcib2nToSWpjr1hiyEIvUZw/MqU+JBy8ViNvSsEHduzhNOXLfLvHTFk= MIME-Version: 1.0 X-Received: by 10.31.192.12 with SMTP id q12mr12060477vkf.96.1450700071211; Mon, 21 Dec 2015 04:14:31 -0800 (PST) Received: by 10.31.5.196 with HTTP; Mon, 21 Dec 2015 04:14:31 -0800 (PST) In-Reply-To: References: Date: Mon, 21 Dec 2015 13:14:31 +0100 Message-ID: Subject: Re: What is the ideal way to merge two Cassandra clusters with same keyspace into one? From: George Sigletos To: Noorul Islam K M Cc: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001a114394266bcded0527676ed4 --001a114394266bcded0527676ed4 Content-Type: text/plain; charset=UTF-8 Roughly half TB of data. There is a timestamp column in the tables we migrated and we did use that to achieve incremental updates. I don't know anything about kairosdb, but I can see from the docs that there exists a row timestamp column. Could you maybe use that one? Kind regards, George On Mon, Dec 21, 2015 at 12:53 PM, Noorul Islam K M wrote: > George Sigletos writes: > > > Hello, > > > > We had a similar problem where we needed to migrate data from one cluster > > to another. > > > > We ended up using Spark to accomplish this. It is fast and reliable but > > some downtime was required after all. > > > > We minimized the downtime by doing a first run, and then run incremental > > updates. > > > > How much data are you talking about? > > How did you achieve incremental run? We are using kairosdb and some of > the other schemas does not have a way to filter based on date. > > Thanks and Regards > Noorul > > > Kind regards, > > George > > > > > > > > On Mon, Dec 21, 2015 at 10:12 AM, Noorul Islam K M > > wrote: > > > >> > >> Hello all, > >> > >> We have two clusters X and Y with same keyspaces but distinct data sets. > >> We are planning to merge these into single cluster. What would be ideal > >> steps to achieve this without downtime for applications? We have time > >> series data stream continuously writing to Cassandra. > >> > >> We have ruled out export/import as that will make us loose data during > >> the time of copy. > >> > >> We also ruled out sstableloader as that is not reliable. It fails often > >> and there is not way to start from where it failed. > >> > >> Any suggestions will help. > >> > >> Thanks and Regards > >> Noorul > >> > --001a114394266bcded0527676ed4 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Roughly half TB of data.

There is a= timestamp column in the tables we migrated and we did use that to achieve = incremental updates.

I don't know anything about kairosdb,= but I can see from the docs that there exists a row timestamp column. Coul= d you maybe use that one?

Kind regards,
George

On Mon, Dec 21= , 2015 at 12:53 PM, Noorul Islam K M <noorul@noorul.com> wro= te:
George Sigletos <= sigletos@textkernel.nl> wr= ites:

> Hello,
>
> We had a similar problem where we needed to migrate data from one clus= ter
> to another.
>
> We ended up using Spark to accomplish this. It is fast and reliable bu= t
> some downtime was required after all.
>
> We minimized the downtime by doing a first run, and then run increment= al
> updates.
>

How much data are you talking about?

How did you achieve incremental run? We are using kairosdb and some of
the other schemas does not have a way to filter based on date.

Thanks and Regards
Noorul

> Kind regards,
> George
>
>
>
> On Mon, Dec 21, 2015 at 10:12 AM, Noorul Islam K M <noorul@noorul.com>
> wrote:
>
>>
>> Hello all,
>>
>> We have two clusters X and Y with same keyspaces but distinct data= sets.
>> We are planning to merge these into single cluster. What would be = ideal
>> steps to achieve this without downtime for applications? We have t= ime
>> series data stream continuously writing to Cassandra.
>>
>> We have ruled out export/import as that will make us loose data du= ring
>> the time of copy.
>>
>> We also ruled out sstableloader as that is not reliable. It fails = often
>> and there is not way to start from where it failed.
>>
>> Any suggestions will help.
>>
>> Thanks and Regards
>> Noorul
>>

--001a114394266bcded0527676ed4--