From user-return-64792-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Fri Nov 29 07:08:45 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 5AAEB180657 for ; Fri, 29 Nov 2019 08:08:45 +0100 (CET) Received: (qmail 14430 invoked by uid 500); 29 Nov 2019 07:08:41 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 14420 invoked by uid 99); 29 Nov 2019 07:08:41 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Nov 2019 07:08:41 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id B72A0180668 for ; Fri, 29 Nov 2019 07:08:40 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.251 X-Spam-Level: X-Spam-Status: No, score=0.251 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=0.2, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 2-LXpyZMGIE3 for ; Fri, 29 Nov 2019 07:08:38 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2607:f8b0:4864:20::f36; helo=mail-qv1-xf36.google.com; envelope-from=cclive1601@gmail.com; receiver= Received: from mail-qv1-xf36.google.com (mail-qv1-xf36.google.com [IPv6:2607:f8b0:4864:20::f36]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id 7A0607DDD2 for ; Fri, 29 Nov 2019 07:08:37 +0000 (UTC) Received: by mail-qv1-xf36.google.com with SMTP id x14so11226931qvu.0 for ; Thu, 28 Nov 2019 23:08:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=krBU/Azt8LCdT/s/5ntvnk6xalzAZjYRfkklO3ZZQRo=; b=pCPDE2sMNM+9fd1YRWkN897NW43MC2Jv0coJOKBabVgRBoY5h+Uwq8i0HH/LZs4qiy E4c1RSvRYm3T3CMERJqDIgC5t3OapuTcqYeci9XkcHeShWqrhhRwMuXMC0dpVqubBDf6 a+bc6lj0imUjhq53CKITmMdB1ZbcJJ0oFNFyO46D/JtDCxHKWzlYvz7ktyvtkIJqUdPp NNOxDcSVqQSl/GWOe+hzlI2FoqRl2RPkE+LKhB1Pj1g/PjST5IH4bELYdmty1LvrjkgL 12KYPqrEABQ+hAIacA42SZNhETwc/WyiD/lg3YltzbscB1csu0d6ajSJ867MY9QmM/HF 3adA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=krBU/Azt8LCdT/s/5ntvnk6xalzAZjYRfkklO3ZZQRo=; b=Mjf5Vp2sr5RmasiflnMQXy+XWTJ9jzxvgfuu/Eoh5drT/Az7LRFVS7Se8p9vRAJ4Cm O6mKQfbCO6xZlWkpWuQsFc4S7QIMl7SI/anRi3ud7rFyM2qUKr2XgSTqbgLf+nkF09UK TrBapBeMWsmunD0lw8fGX1IUI5YGDvjW2Fc+xO8glNccIY/dnELPCWnPbxc5lZCPspnc tlyuVlEiZkhMLm9mJOHLEBHuCydAroP2Z/qIGpwdjSdxV0h9XYrZI5Xhs5LbMIhmOTt3 kdfAW+nWKfMo0TjoJ95GPLr9yYJV3YEYrDWi+2AeA/ATke4CvjSQrCCfZF8mZVF9ja3l GV+g== X-Gm-Message-State: APjAAAXy3XbI+tDi/VMkGnnYrRfdE0T3fENoGbyhEDh5iLuEUpDXJjE/ d2EYCOrC0inL39jEIReoy3gZD56Cmou5yLphEdTGJQ== X-Google-Smtp-Source: APXvYqw7FVo4FuPD8qoNct8HBgPKUsvI+cTg+nZiRTBP3dQizIDvjCSQusus/osEkuYKdv+J9P6PuAvjhP+HCFuTPh0= X-Received: by 2002:ad4:58b0:: with SMTP id ea16mr15322630qvb.71.1575011309596; Thu, 28 Nov 2019 23:08:29 -0800 (PST) MIME-Version: 1.0 References: <6190fd1d-0abf-7a36-7172-983cf12a0195@strapdata.com> In-Reply-To: From: guo Maxwell Date: Fri, 29 Nov 2019 15:08:18 +0800 Message-ID: Subject: Re: Optimal backup strategy To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary="000000000000a03527059876e567" --000000000000a03527059876e567 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Same topology means the restore node should got the same tokes with the backup nodes ; ex : backup node1(1/2/3/4/5) node2(6/7/8/9/10) restore : nodea(1/2/3/4/5) nodeb(6/7/8/9/10) so node1's commitlog can be replay on nodea . Adarsh Kumar =E4=BA=8E2019=E5=B9=B411=E6=9C=8829=E6= =97=A5=E5=91=A8=E4=BA=94 =E4=B8=8B=E5=8D=882:03=E5=86=99=E9=81=93=EF=BC=9A > Thanks Ahu and Hussein, > > So my understanding is: > > 1. Commit log backup is not documented for Apache Cassandra, hence not > standard. But can be used for restore on the same machine (For taking > backup from commit_log_dir). If used on other machine(s) has to be in = the > same topology. Can it be used for replacement node? > 2. For periodic backup Snapshot+Incremental backup is the best option > > > Thanks, > Adarsh Kumar > > On Fri, Nov 29, 2019 at 7:28 AM guo Maxwell wrote: > >> Hossein is right , But for use , we restore to the same cassandra >> topology ,So it is usable to do replay .But when restore to the >> same machine it is also usable . >> Using sstableloader cost too much time and more storage(though will >> reduce after restored) >> >> Hossein Ghiyasi Mehr =E4=BA=8E2019=E5=B9=B411=E6= =9C=8828=E6=97=A5=E5=91=A8=E5=9B=9B =E4=B8=8B=E5=8D=887:40=E5=86=99=E9=81= =93=EF=BC=9A >> >>> commitlog backup isn't usable in another machine. >>> Backup solution depends on what you want to do: periodic backup or >>> backup to restore on other machine? >>> Periodic backup is combine of snapshot and incremental backup. Remove >>> incremental backup after new snapshot. >>> Take backup to restore on other machine: You can use snapshot after >>> flushing memtable or Use sstableloader. >>> >>> >>> ---- >>> VafaTech.com - A Total Solution for Data Gathering & Analysis >>> >>> On Thu, Nov 28, 2019 at 6:05 AM guo Maxwell >>> wrote: >>> >>>> for cassandra or datastax's documentation, commitlog's backup is not >>>> mentioned. >>>> only snapshot and incremental backup is described to do backup . >>>> >>>> Though commitlog's archive for keyspace/table is not support but >>>> commitlog' replay (though you must put log to commitlog_dir and restar= t the >>>> process) >>>> support the feature of keyspace/table' replay filter (using >>>> -Dcassandra.replayList with the keyspace1.table1,keyspace1.table2 form= at to >>>> replay the specified keyspace/table) >>>> >>>> Snapshot do affect the storage, for us we got snapshot one week a time >>>> under the low business peak and making snapshot got throttle ,for you = you >>>> may >>>> see the issue (https://issues.apache.org/jira/browse/CASSANDRA-13019) >>>> >>>> >>>> >>>> Adarsh Kumar =E4=BA=8E2019=E5=B9=B411=E6=9C=882= 8=E6=97=A5=E5=91=A8=E5=9B=9B =E4=B8=8A=E5=8D=881:00=E5=86=99=E9=81=93=EF=BC= =9A >>>> >>>>> Thanks Guo and Eric for replying, >>>>> >>>>> I have some confusions about commit log backup: >>>>> >>>>> 1. commit log archival technique is ( >>>>> https://support.datastax.com/hc/en-us/articles/115001593706-Manual= -Backup-and-Restore-with-Point-in-time-and-table-level-restore- >>>>> ) as good as an incremental backup, as it also captures commit log= s after >>>>> memtable flush. >>>>> 2. If we go for "Snapshot + Incremental bk + Commit log", here we >>>>> have to take commit log from commit log directory (is there any SO= P for >>>>> this?). As commit logs are not per table or ks, we will have chala= nge in >>>>> restoring selective tables. >>>>> 3. Snapshot based backups are easy to manage and operate due to >>>>> its simplicity. But they are heavy on storage. Any views on this? >>>>> 4. Please share any successful strategy that someone is using for >>>>> production. We are still in the design phase and want to implement= the best >>>>> solution. >>>>> >>>>> Thanks Eric for sharing link for medusa. >>>>> >>>>> Regards, >>>>> Adarsh Kumar >>>>> >>>>> On Wed, Nov 27, 2019 at 5:16 PM guo Maxwell >>>>> wrote: >>>>> >>>>>> For me, I think the last one : >>>>>> Snapshot + Incremental + commitlog >>>>>> is the most meaningful way to do backup and restore, when you make >>>>>> the data backup to some where else like AWS S3. >>>>>> >>>>>> - Snapshot based backup // for incremental data will not be >>>>>> backuped and may lose data when restore to the time latter than s= napshot >>>>>> time; >>>>>> - Incremental backups // better than snapshot backup .but >>>>>> with Insufficient data accuracy. For data remain in the memtable = will be >>>>>> lose; >>>>>> - Snapshot + incremental >>>>>> - Snapshot + commitlog archival // better data precision than >>>>>> made incremental backup, but the data in the non archived commitl= og(not >>>>>> archive and commitlog log not closed) will not restore and will l= ose. Also >>>>>> when log is too much, do log reply will cost very mucu time >>>>>> >>>>>> For me ,We use snapshot + incremental + commitlog archive. We read >>>>>> snapshot data and incremental data .Also the log is backuped .But we= will >>>>>> not backup the >>>>>> log whose data have been flush to sstable ,for the data will be >>>>>> backuped by the way we do incremental backup . >>>>>> >>>>>> This way , the data will exist in the format of sstable trough >>>>>> snapshot backup and incremental backup . The log number will be very= small >>>>>> .And log replay will not cost much time. >>>>>> >>>>>> >>>>>> >>>>>> Eric LELEU =E4=BA=8E2019=E5=B9=B411=E6=9C=8827= =E6=97=A5=E5=91=A8=E4=B8=89 =E4=B8=8B=E5=8D=884:13=E5=86=99=E9=81=93=EF=BC= =9A >>>>>> >>>>>>> Hi, >>>>>>> TheLastPickle & Spotify have released Medusa as Cassandra Backup >>>>>>> tool. >>>>>>> >>>>>>> See : >>>>>>> https://thelastpickle.com/blog/2019/11/05/cassandra-medusa-backup-t= ool-is-open-source.html >>>>>>> >>>>>>> Hope this link will help you. >>>>>>> >>>>>>> Eric >>>>>>> >>>>>>> >>>>>>> Le 27/11/2019 =C3=A0 08:10, Adarsh Kumar a =C3=A9crit : >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I was looking for the backup strategies of Cassandra. After some >>>>>>> study I came to know that there are the following options: >>>>>>> >>>>>>> - Snapshot based backup >>>>>>> - Incremental backups >>>>>>> - Snapshot + incremental >>>>>>> - Snapshot + commitlog archival >>>>>>> - Snapshot + Incremental + commitlog >>>>>>> >>>>>>> Which is the most suitable and feasible approach? Also which of >>>>>>> these is used most. >>>>>>> Please let me know if there is any other option to tool available. >>>>>>> >>>>>>> Thanks in advance. >>>>>>> >>>>>>> Regards, >>>>>>> Adarsh Kumar >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> you are the apple of my eye ! >>>>>> >>>>> >>>> >>>> -- >>>> you are the apple of my eye ! >>>> >>> >> >> -- >> you are the apple of my eye ! >> > --=20 you are the apple of my eye ! --000000000000a03527059876e567 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Same topology means the restore node should got the same t= okes with the backup nodes ;
ex : backup=C2=A0
=C2=A0 =C2=A0n= ode1(1/2/3/4/5) node2(6/7/8/9/10)=C2=A0
restore :
=C2= =A0 nodea(1/2/3/4/5) nodeb(6/7/8/9/10)
so node1's commitlog c= an be replay on nodea .

Adarsh Kumar <adarsh0007@gmail.com> =E4=BA=8E2019=E5=B9=B411=E6=9C=882= 9=E6=97=A5=E5=91=A8=E4=BA=94 =E4=B8=8B=E5=8D=882:03=E5=86=99=E9=81=93=EF=BC= =9A
Thanks Ahu and Hussein,

So my understanding is:
  1. Commit log backup is not documented for Apache Cassandra, he= nce not standard. But can be used for restore on the same machine (For taki= ng backup from commit_log_dir). If used on other machine(s) has to be in th= e same topology. Can it be used for replacement node?=C2=A0
  2. For per= iodic backup Snapshot+Incremental backup is the best option
<= br>
Thanks,
Adarsh Kumar

On Fri, Nov 29, 201= 9 at 7:28 AM guo Maxwell <cclive1601@gmail.com> wrote:
Hossein is right , But fo= r use , we restore to the same cassandra topology ,So it is usable to do re= play .But when restore to the=C2=A0
same machine it is also usable .
Using sstableloader cost too much time and more storage(though will= reduce after=C2=A0 restored)

Hossein Ghiyasi Mehr <ghiyasimehr@gmail.com> = =E4=BA=8E2019=E5=B9=B411=E6=9C=8828=E6=97=A5=E5=91=A8=E5=9B=9B =E4=B8=8B=E5= =8D=887:40=E5=86=99=E9=81=93=EF=BC=9A
commitlog backup isn't usab= le in another machine.
Backup solution depends on what you want t= o do: periodic backup or backup to restore on other machine?
Peri= odic=C2=A0backup is combine of snapshot and incremental backup. Remove incr= emental backup after new snapshot.
Take backup to restore=C2=A0on= other machine: You can use snapshot after flushing memtable or Use sstable= loader.


----
VafaTech.com - A Total = Solution for Data Gathering & Analysis

On Thu, Nov 28, 2019 at 6:0= 5 AM guo Maxwell <cclive1601@gmail.com> wrote:
for cassandra or datastax's do= cumentation, commitlog's backup is not mentioned.=C2=A0
only snapsh= ot and incremental backup is described to do backup .

<= div>Though commitlog's archive for keyspace/table is not support but co= mmitlog' replay (though you must put log to commitlog_dir and restart t= he process)
support the feature of keyspace/table' replay fil= ter (using -Dcassandra.replayList with the keyspace1.table1,keyspace1.table= 2 format to replay the specified keyspace/table)

S= napshot do affect the storage, for us we got snapshot one week a time under= the low business peak and making snapshot got throttle ,for you you may=C2= =A0



= Adarsh Kumar <= adarsh0007@gmail.com> =E4=BA=8E2019=E5=B9=B411=E6=9C=8828=E6=97=A5= =E5=91=A8=E5=9B=9B =E4=B8=8A=E5=8D=881:00=E5=86=99=E9=81=93=EF=BC=9A
Thank= s Guo and Eric for replying,

I have some confusions abou= t=C2=A0commit log backup:
  1. commit log archival technique i= s ( https://support.datastax.com/hc/en-us/articles/115001593706-Man= ual-Backup-and-Restore-with-Point-in-time-and-table-level-restore-=C2= =A0 ) as good as an incremental backup, as it also captures commit logs aft= er memtable flush.
  2. If we go for "Snapshot + Incremental bk + C= ommit log", here we have to take commit log from commit log directory = (is there any SOP for this?). As commit logs are not per table or ks, we wi= ll have chalange in restoring selective tables.
  3. Snapshot based back= ups are easy to manage and operate due to its simplicity. But they are heav= y on storage. Any views on this?
  4. Please share any successful strate= gy that someone is using for production. We are still in the design phase a= nd want to implement=C2=A0the best solution.
Thanks Eric for = sharing link for medusa.

Regards,
= Adarsh Kumar

On Wed, Nov 27, 2019 at 5:16 PM guo Maxwell <cclive1601@gmail.com= > wrote:
For me, I think the last one :=C2=A0
=C2=A0Snapshot=C2=A0+= Incremental=C2=A0+ commitlog=C2=A0
is the most=C2=A0meaningf= ul way to do backup and restore, when you make the data backup to some wher= e else like AWS S3.
  • Snapshot b= ased backup // for incremental data will not be backuped and may lose data = when restore to the time latter than snapshot time;
  • Incremental backups // better than snapshot backup .but with=C2= =A0Insufficient data accuracy. For data remain in the memtable will be lose= ;
  • Snapshot=C2=A0+ incremental=C2=A0
  • =
  • Snapshot=C2=A0+ commitlog archival // better= data precision than made incremental backup, but the data in the non archi= ved commitlog(not archive and commitlog log not closed) will not restore an= d will lose. Also when log is too much, do log reply will cost very mucu ti= me
For me ,We use snapshot=C2=A0+ incremental=C2=A0+ commitlo= g archive. We read snapshot data and incremental data .Also the log is back= uped .But we will not backup the=C2=A0
log whose=C2=A0data have b= een flush to sstable ,for the data will be backuped by the way we do increm= ental backup .

This way , the data will exist in t= he format of sstable trough snapshot backup and incremental backup . The lo= g number will be very small .And log replay will not cost much time.
<= div>


Eric LELEU <eric@strapdata.com> =E4=BA=8E2019=E5= =B9=B411=E6=9C=8827=E6=97=A5=E5=91=A8=E4=B8=89 =E4=B8=8B=E5=8D=884:13=E5=86= =99=E9=81=93=EF=BC=9A
=20 =20 =20

Hi,

TheLastPickle & Spotify have released Medusa as Cassandra Backup tool.

See : https://thelastpickle.com/blo= g/2019/11/05/cassandra-medusa-backup-tool-is-open-source.html

Hope this link will help you.

Eric


Le 27/11/2019 =C3=A0 08:10, Adarsh Kumar a =C3=A9crit=C2=A0:
=20
Hi,

I was looking for the backup strategies of Cassandra. After some study I came to know that there are the following options:
  • Snapshot based backup
  • Incremental backups
  • Snapshot=C2=A0+ incremental
  • Snapshot=C2=A0+ commitlog archival
  • Snapshot=C2=A0+ Incremental=C2=A0+ commitlog
Which is the most suitable=C2=A0and feasible=C2=A0approach? = Also which of these is used most.
Please let me know if there=C2=A0is any other option to tool available.

Thanks in advance.

Regards,
Adarsh Kumar


--
you are the apple of my eye !


--
you are the apple of my eye !


--
you are the apple of my eye !


--
you are the apple of my eye !
--000000000000a03527059876e567--