Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 39610FA69 for ; Mon, 25 Mar 2013 16:40:43 +0000 (UTC) Received: (qmail 20004 invoked by uid 500); 25 Mar 2013 16:40:40 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 19980 invoked by uid 500); 25 Mar 2013 16:40:40 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 19971 invoked by uid 99); 25 Mar 2013 16:40:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Mar 2013 16:40:40 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW X-Spam-Check-By: apache.org Received-SPF: unknown ipv4:173.13.187.41 (nike.apache.org: encountered unrecognized mechanism during SPF processing of domain of frichard@xobni.com) Received: from [209.85.128.177] (HELO mail-ve0-f177.google.com) (209.85.128.177) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Mar 2013 16:40:35 +0000 Received: by mail-ve0-f177.google.com with SMTP id jw11so941120veb.36 for ; Mon, 25 Mar 2013 09:40:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type:x-gm-message-state; bh=JPsiplOBCrK5HvpP6zPhF7RjpnB7n82MFoys+Irc1eQ=; b=POlwLaVV+JFLfQe9XdXz3vHJ1GGrBrCy74pxNTFdnfX9RXgxkxpCAHUSRfQWwg5VUf NWqXu1T+hnndi0j8S4hxM4JBwbASEl+rQVGlVQxPlVVV2YuLNq28NfdPyQNSueDwgfQk 3LS9ArVtGLsdEJgqf2yPQ3pwnghSCILLiE4dE4YwX3Uq02IVGJcO4hDqCPcFBqF0y2Lw g6Zz+KyHL+Wm3urPqY4l8lsITDQNgJY+BhXzaXkxOpWi5IkMG+16R3vNtLpZueBdQE2g EELUNKAz8gyAdLgqZvDLHsxyeioAOv6xIvSL3qx90DpMQLAFEsvirxPmz1Hd0vuXab6k 7CFw== MIME-Version: 1.0 X-Received: by 10.220.153.143 with SMTP id k15mr16618159vcw.13.1364229614250; Mon, 25 Mar 2013 09:40:14 -0700 (PDT) Received: by 10.58.191.69 with HTTP; Mon, 25 Mar 2013 09:40:14 -0700 (PDT) In-Reply-To: <0A0F17A1-1D13-4A6C-814F-B6FDD0BB3F84@thelastpickle.com> References: <0A0F17A1-1D13-4A6C-814F-B6FDD0BB3F84@thelastpickle.com> Date: Mon, 25 Mar 2013 09:40:14 -0700 Message-ID: Subject: Re: Many to one type of replication. From: Francois Richard To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=f46d043d67718cbf7f04d8c275fa X-Gm-Message-State: ALoCoQnHPNwlZ2028xoWLAkqN0vD4d+tOtx+HTWbNot99qxFhdo7o1YZjEJqhDBhe4vkuLvSwJy9 X-Virus-Checked: Checked by ClamAV on apache.org --f46d043d67718cbf7f04d8c275fa Content-Type: text/plain; charset=ISO-8859-1 Thanks much, I wanted to confirm. We will do this at the application level. FR On Sun, Mar 24, 2013 at 10:03 AM, aaron morton wrote: > From this mailing list I found this Github project that is doing something > similar by looking at the commit logs: > https://github.com/carloscm/cassandra-commitlog-extract > > IMHO tailing the logs is fragile, and you may be better off handling it at > the application level. > > But is there other options around using a custom replication strategy? > > There is no such thing as "one directional" replication. For example > replication everything from DC 1 to DC 2, but do not replicate from DC 2 to > DC 1. > You may be better off reducing the number of clusters and then running one > transactional and one analytical DC. > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 24/03/2013, at 3:42 AM, Francois Richard wrote: > > Hi, > > We currently run our Cassandra deployment with > multiple independent clusters. The clusters are totally self contain in > terms of redundancy and independent from each others. We have a "sharding > "layer higher in our stack to dispatch the requests to the right > application stack and this stack connects to his associated Cassandra > cluster. All the cassandra clusters are identical in terms of hosted > keyspaces, column families, replication factor ... > > At this point I am investigating ways to build a central cassandra cluster > that could contain all the data from all the other cassandra clusters and I > am wondering how to best do it. The goal is to have a global view of our > data and to be able to do some massive crunching on it. > > For sure we can build some ETL type of job that would figure out the data > that was updated, extract it, and load it to the central cassandra cluster. > From this mailing list I found this Github project that is doing something > similar by looking at the commit logs: > https://github.com/carloscm/cassandra-commitlog-extract > > But is there other options around using a custom replication strategy? > Any other general suggestions ? > > Thanks, > > FR > > -- > > _____________________________________________ > > *Francois Richard * > > > > -- _____________________________________________ *Francois Richard * VP Server Engineering and Operations** Xobni Engineering Xobni, Inc. 539 Bryant St San Francisco, CA 94107 415-987-5305 Mobile (For emergencies please leave a voice-mail to mobile) www.xobni.com** --f46d043d67718cbf7f04d8c275fa Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Thanks much,

I wanted to confirm.= =A0We will do this at the application level.

FR


On Sun, Mar 24, 2013 at 10:03 AM, aaron morton <aaron@thelastpickle.= com> wrote:
From this mailing list I found this Github project that is doing someth= ing similar by looking at the commit logs:=A0https://github.com/= carloscm/cassandra-commitlog-extract
IMHO tailing the logs is fragile, and you may be better off ha= ndling it at the application level.=A0

But is there other options around using a custom replic= ation strategy?
There is no such thing as "one directional" replicat= ion. For example replication everything from DC 1 to DC 2, but do not repli= cate from DC 2 to DC 1.=A0
You may be better off reducing the num= ber of clusters and then running one transactional and one analytical DC.

Cheers

-----------------
Aaron Morton
Freelance Cassandra= Consultant
New Zealand


On 24/03/2013, at 3:42 AM, Francois Richard <frichard@xobni.com> wrote= :

Hi,

We currently run our Cassandra deployment with multiple=A0independent=A0c= lusters. =A0The clusters are totally self contain in terms of redundancy an= d independent=A0from each others. =A0We have a "sharding "layer h= igher in our stack to dispatch the requests to the right application stack = and this stack connects to his associated Cassandra cluster. All the cassan= dra clusters are identical in terms of hosted keyspaces,=A0column=A0familie= s, replication factor ...

At this point I am=A0investigating=A0ways to build a ce= ntral cassandra cluster that could contain all the data from all the other = cassandra clusters and I am wondering how to best do it. =A0The goal is to = have a global view of our data and to be able to do some massive crunching = on it.

For sure we can build some ETL type of job that would f= igure out the data that was updated, extract it, and load it to the central= cassandra cluster. =A0From this mailing list I found this Github project t= hat is doing something similar by looking at the commit logs: ht= tps://github.com/carloscm/cassandra-commitlog-extract

But is there other options around using a custom replic= ation strategy? =A0Any other general suggestions ?

Thanks,

FR=A0

--

_____________________________________________

Francois Richard






--

_____________________________________________

Francois Richard

VP Serve= r Engineering and Operations=

Xobni En= gineering

Xobni, Inc.

539 Brya= nt St

San Fran= cisco,=A0CA=A0 94107=A0

415-987-= 5305 Mobile

(For emergencies please le= ave a voice-mail to mobile)


www.xobni.com

--f46d043d67718cbf7f04d8c275fa--