From user-return-64789-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Fri Nov 29 01:58:49 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 3F77F18064C for ; Fri, 29 Nov 2019 02:58:49 +0100 (CET) Received: (qmail 23902 invoked by uid 500); 29 Nov 2019 01:58:44 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 23891 invoked by uid 99); 29 Nov 2019 01:58:44 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Nov 2019 01:58:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id C1E35C12E9 for ; Fri, 29 Nov 2019 01:58:43 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.25 X-Spam-Level: X-Spam-Status: No, score=0.25 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=0.2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id mus2_xouL0lL for ; Fri, 29 Nov 2019 01:58:40 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.222.170; helo=mail-qk1-f170.google.com; envelope-from=cclive1601@gmail.com; receiver= Received: from mail-qk1-f170.google.com (mail-qk1-f170.google.com [209.85.222.170]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 11CA2BC530 for ; Fri, 29 Nov 2019 01:58:40 +0000 (UTC) Received: by mail-qk1-f170.google.com with SMTP id x1so9509243qkl.12 for ; Thu, 28 Nov 2019 17:58:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=kcv7Q+jLIDi4Hy/JDA5XI5lessoIybSf8fU15LOEEFU=; b=jG4xSerJ9sVb3W0XxUg79MFz5wkKWI97hhcNpLWPatW6BPYWluUSH52kgXMksSTnU2 9qeyRvl5gLeebiVZJ1BsnMHTKgKWMHU04zHrjTqT+f6FNwZ7S6pMnSzS9r/Tnaw4M6EU wOSUkFKDyg1ee5Q53jsdWhjPEazPZNWxS6XKtAGNqleP14XBDr/S4TZruSaMom3fSdqd QV292shtiV3ptXbEkj/ROzaMIIwKfj7QD1FJHFuWPoyS1iQODAyR3fc/MYgOyb8rXVJx w9u0cPHMFp4ntnR2Ca5qbOij1fb+2Zzd8lA4da1cP0AMr5DV680/q/6pX1ZKLmnMhGKG 1ywQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=kcv7Q+jLIDi4Hy/JDA5XI5lessoIybSf8fU15LOEEFU=; b=NQa9wO/F9yoWdeG3VGdxuawTLhU2mW+T34X8Hx8a9xAhkz7/bz0rInImlHG1BisNrU j6O2jRVOEQ+4yG0vqgjojQ93StWSfOKNBJRDPr9hm061A3236d/cFVIXcZmql4NLiNks d+pRXmxq24o8LxptzDcswxB3KTE06+wfRWJKxQ+uEDNOGpHp/gLwjUPNJt2r/nTPiOIU CckpweljOtSOZziakwpZS40C9jcx4V3fW+rpkU/NKaxQbrrRS4N0rtnDyXkIRfIZVrfJ BmpJIip967FlmJ++Ciqsl6EZ1S2I6QXPNfbcRxmBSCBjdeqIpmEXrfxXo5//+orL8ZcC Q1tw== X-Gm-Message-State: APjAAAX2zCGkEIo/bqvWIirrqYgmHM/Kb1IqzKPaTJ8HiGoBS1UzkLiI 5MDXTz1JSE/D+Ovxhq7HoNS2tFrEVwc4/gDKcfqmdQ== X-Google-Smtp-Source: APXvYqxp3FuVaopCHzlMliXlTpIxduNHeIp1Eb5q1UbnpPVFhksDTxos58A9mxVJg7l5C55IEqlg9rhgiNdO3w5zzro= X-Received: by 2002:a37:f504:: with SMTP id l4mr13231482qkk.319.1574992719433; Thu, 28 Nov 2019 17:58:39 -0800 (PST) MIME-Version: 1.0 References: <6190fd1d-0abf-7a36-7172-983cf12a0195@strapdata.com> In-Reply-To: From: guo Maxwell Date: Fri, 29 Nov 2019 09:58:28 +0800 Message-ID: Subject: Re: Optimal backup strategy To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary="00000000000090d52d0598729171" --00000000000090d52d0598729171 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hossein is right , But for use , we restore to the same cassandra topology ,So it is usable to do replay .But when restore to the same machine it is also usable . Using sstableloader cost too much time and more storage(though will reduce after restored) Hossein Ghiyasi Mehr =E4=BA=8E2019=E5=B9=B411=E6=9C= =8828=E6=97=A5=E5=91=A8=E5=9B=9B =E4=B8=8B=E5=8D=887:40=E5=86=99=E9=81=93= =EF=BC=9A > commitlog backup isn't usable in another machine. > Backup solution depends on what you want to do: periodic backup or backup > to restore on other machine? > Periodic backup is combine of snapshot and incremental backup. Remove > incremental backup after new snapshot. > Take backup to restore on other machine: You can use snapshot after > flushing memtable or Use sstableloader. > > > ---- > VafaTech.com - A Total Solution for Data Gathering & Analysis > > On Thu, Nov 28, 2019 at 6:05 AM guo Maxwell wrote: > >> for cassandra or datastax's documentation, commitlog's backup is not >> mentioned. >> only snapshot and incremental backup is described to do backup . >> >> Though commitlog's archive for keyspace/table is not support but >> commitlog' replay (though you must put log to commitlog_dir and restart = the >> process) >> support the feature of keyspace/table' replay filter (using >> -Dcassandra.replayList with the keyspace1.table1,keyspace1.table2 format= to >> replay the specified keyspace/table) >> >> Snapshot do affect the storage, for us we got snapshot one week a time >> under the low business peak and making snapshot got throttle ,for you yo= u >> may >> see the issue (https://issues.apache.org/jira/browse/CASSANDRA-13019) >> >> >> >> Adarsh Kumar =E4=BA=8E2019=E5=B9=B411=E6=9C=8828= =E6=97=A5=E5=91=A8=E5=9B=9B =E4=B8=8A=E5=8D=881:00=E5=86=99=E9=81=93=EF=BC= =9A >> >>> Thanks Guo and Eric for replying, >>> >>> I have some confusions about commit log backup: >>> >>> 1. commit log archival technique is ( >>> https://support.datastax.com/hc/en-us/articles/115001593706-Manual-B= ackup-and-Restore-with-Point-in-time-and-table-level-restore- >>> ) as good as an incremental backup, as it also captures commit logs = after >>> memtable flush. >>> 2. If we go for "Snapshot + Incremental bk + Commit log", here we >>> have to take commit log from commit log directory (is there any SOP = for >>> this?). As commit logs are not per table or ks, we will have chalang= e in >>> restoring selective tables. >>> 3. Snapshot based backups are easy to manage and operate due to its >>> simplicity. But they are heavy on storage. Any views on this? >>> 4. Please share any successful strategy that someone is using for >>> production. We are still in the design phase and want to implement t= he best >>> solution. >>> >>> Thanks Eric for sharing link for medusa. >>> >>> Regards, >>> Adarsh Kumar >>> >>> On Wed, Nov 27, 2019 at 5:16 PM guo Maxwell >>> wrote: >>> >>>> For me, I think the last one : >>>> Snapshot + Incremental + commitlog >>>> is the most meaningful way to do backup and restore, when you make the >>>> data backup to some where else like AWS S3. >>>> >>>> - Snapshot based backup // for incremental data will not be >>>> backuped and may lose data when restore to the time latter than sna= pshot >>>> time; >>>> - Incremental backups // better than snapshot backup .but >>>> with Insufficient data accuracy. For data remain in the memtable wi= ll be >>>> lose; >>>> - Snapshot + incremental >>>> - Snapshot + commitlog archival // better data precision than made >>>> incremental backup, but the data in the non archived commitlog(not = archive >>>> and commitlog log not closed) will not restore and will lose. Also = when log >>>> is too much, do log reply will cost very mucu time >>>> >>>> For me ,We use snapshot + incremental + commitlog archive. We read >>>> snapshot data and incremental data .Also the log is backuped .But we w= ill >>>> not backup the >>>> log whose data have been flush to sstable ,for the data will be >>>> backuped by the way we do incremental backup . >>>> >>>> This way , the data will exist in the format of sstable trough snapsho= t >>>> backup and incremental backup . The log number will be very small .And= log >>>> replay will not cost much time. >>>> >>>> >>>> >>>> Eric LELEU =E4=BA=8E2019=E5=B9=B411=E6=9C=8827=E6= =97=A5=E5=91=A8=E4=B8=89 =E4=B8=8B=E5=8D=884:13=E5=86=99=E9=81=93=EF=BC=9A >>>> >>>>> Hi, >>>>> TheLastPickle & Spotify have released Medusa as Cassandra Backup tool= . >>>>> >>>>> See : >>>>> https://thelastpickle.com/blog/2019/11/05/cassandra-medusa-backup-too= l-is-open-source.html >>>>> >>>>> Hope this link will help you. >>>>> >>>>> Eric >>>>> >>>>> >>>>> Le 27/11/2019 =C3=A0 08:10, Adarsh Kumar a =C3=A9crit : >>>>> >>>>> Hi, >>>>> >>>>> I was looking for the backup strategies of Cassandra. After some stud= y >>>>> I came to know that there are the following options: >>>>> >>>>> - Snapshot based backup >>>>> - Incremental backups >>>>> - Snapshot + incremental >>>>> - Snapshot + commitlog archival >>>>> - Snapshot + Incremental + commitlog >>>>> >>>>> Which is the most suitable and feasible approach? Also which of these >>>>> is used most. >>>>> Please let me know if there is any other option to tool available. >>>>> >>>>> Thanks in advance. >>>>> >>>>> Regards, >>>>> Adarsh Kumar >>>>> >>>>> >>>> >>>> -- >>>> you are the apple of my eye ! >>>> >>> >> >> -- >> you are the apple of my eye ! >> > --=20 you are the apple of my eye ! --00000000000090d52d0598729171 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hossein is right , But for use , we restore to the same ca= ssandra topology ,So it is usable to do replay .But when restore to the=C2= =A0
same machine it is also usable .
Using sstableloader cost= too much time and more storage(though will reduce after=C2=A0 restored)

Hossein Ghiyasi Mehr <ghiyas= imehr@gmail.com> =E4=BA=8E2019=E5=B9=B411=E6=9C=8828=E6=97=A5=E5=91= =A8=E5=9B=9B =E4=B8=8B=E5=8D=887:40=E5=86=99=E9=81=93=EF=BC=9A
commit= log backup isn't usable in another machine.
Backup solution d= epends on what you want to do: periodic backup or backup to restore on othe= r machine?
Periodic=C2=A0backup is combine of snapshot and increm= ental backup. Remove incremental backup after new snapshot.
Take = backup to restore=C2=A0on other machine: You can use snapshot after flushin= g memtable or Use sstableloader.


----
VafaTech.com - A Total Solution for Data Gathering & Analysis

On= Thu, Nov 28, 2019 at 6:05 AM guo Maxwell <cclive1601@gmail.com> wrote:
<= blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-l= eft:1px solid rgb(204,204,204);padding-left:1ex">
for cassa= ndra or datastax's documentation, commitlog's backup is not mention= ed.=C2=A0
only snapshot and incremental backup is described to do backu= p .

Though commitlog's archive for keyspace/ta= ble is not support but commitlog' replay (though you must put log to co= mmitlog_dir and restart the process)
support the feature of keysp= ace/table' replay filter (using -Dcassandra.replayList with the keyspac= e1.table1,keyspace1.table2 format to replay the specified keyspace/table)

Snapshot do affect the storage, for us we got snaps= hot one week a time under the low business peak and making snapshot got thr= ottle ,for you you may=C2=A0


= Adarsh Kumar <= adarsh0007@gmail.com> =E4=BA=8E2019=E5=B9=B411=E6=9C=8828=E6=97=A5= =E5=91=A8=E5=9B=9B =E4=B8=8A=E5=8D=881:00=E5=86=99=E9=81=93=EF=BC=9A
Thank= s Guo and Eric for replying,

I have some confusions abou= t=C2=A0commit log backup:
  1. commit log archival technique i= s ( https://support.datastax.com/hc/en-us/articles/115001593706-Man= ual-Backup-and-Restore-with-Point-in-time-and-table-level-restore-=C2= =A0 ) as good as an incremental backup, as it also captures commit logs aft= er memtable flush.
  2. If we go for "Snapshot + Incremental bk + C= ommit log", here we have to take commit log from commit log directory = (is there any SOP for this?). As commit logs are not per table or ks, we wi= ll have chalange in restoring selective tables.
  3. Snapshot based back= ups are easy to manage and operate due to its simplicity. But they are heav= y on storage. Any views on this?
  4. Please share any successful strate= gy that someone is using for production. We are still in the design phase a= nd want to implement=C2=A0the best solution.
Thanks Eric for = sharing link for medusa.

Regards,
= Adarsh Kumar

On Wed, Nov 27, 2019 at 5:16 PM guo Maxwell <cclive1601@gmail.com= > wrote:
For me, I think the last one :=C2=A0
=C2=A0Snapshot=C2=A0+= Incremental=C2=A0+ commitlog=C2=A0
is the most=C2=A0meaningf= ul way to do backup and restore, when you make the data backup to some wher= e else like AWS S3.
  • Snapshot b= ased backup // for incremental data will not be backuped and may lose data = when restore to the time latter than snapshot time;
  • Incremental backups // better than snapshot backup .but with=C2= =A0Insufficient data accuracy. For data remain in the memtable will be lose= ;
  • Snapshot=C2=A0+ incremental=C2=A0
  • =
  • Snapshot=C2=A0+ commitlog archival // better= data precision than made incremental backup, but the data in the non archi= ved commitlog(not archive and commitlog log not closed) will not restore an= d will lose. Also when log is too much, do log reply will cost very mucu ti= me
For me ,We use snapshot=C2=A0+ incremental=C2=A0+ commitlo= g archive. We read snapshot data and incremental data .Also the log is back= uped .But we will not backup the=C2=A0
log whose=C2=A0data have b= een flush to sstable ,for the data will be backuped by the way we do increm= ental backup .

This way , the data will exist in t= he format of sstable trough snapshot backup and incremental backup . The lo= g number will be very small .And log replay will not cost much time.
<= div>


Eric LELEU <eric@strapdata.com> =E4=BA=8E2019=E5= =B9=B411=E6=9C=8827=E6=97=A5=E5=91=A8=E4=B8=89 =E4=B8=8B=E5=8D=884:13=E5=86= =99=E9=81=93=EF=BC=9A
=20 =20 =20

Hi,

TheLastPickle & Spotify have released Medusa as Cassandra Backup tool.

See : https://thelastpickle.com/blo= g/2019/11/05/cassandra-medusa-backup-tool-is-open-source.html

Hope this link will help you.

Eric


Le 27/11/2019 =C3=A0 08:10, Adarsh Kumar a =C3=A9crit=C2=A0:
=20
Hi,

I was looking for the backup strategies of Cassandra. After some study I came to know that there are the following options:
  • Snapshot based backup
  • Incremental backups
  • Snapshot=C2=A0+ incremental
  • Snapshot=C2=A0+ commitlog archival
  • Snapshot=C2=A0+ Incremental=C2=A0+ commitlog
Which is the most suitable=C2=A0and feasible=C2=A0approach? = Also which of these is used most.
Please let me know if there=C2=A0is any other option to tool available.

Thanks in advance.

Regards,
Adarsh Kumar


--
you are the apple of my eye !


--
you are the apple of my eye !


--
you are the apple of my eye !
--00000000000090d52d0598729171--