From user-return-64816-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Tue Dec 3 09:32:14 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id BF0DB180629 for ; Tue, 3 Dec 2019 10:32:13 +0100 (CET) Received: (qmail 23403 invoked by uid 500); 3 Dec 2019 09:32:10 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 23393 invoked by uid 99); 3 Dec 2019 09:32:10 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Dec 2019 09:32:10 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 0E5B01A345C for ; Tue, 3 Dec 2019 09:32:10 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0 X-Spam-Level: X-Spam-Status: No, score=0 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id NvxCbkAv68f5 for ; Tue, 3 Dec 2019 09:32:05 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.222.181; helo=mail-qk1-f181.google.com; envelope-from=ghiyasimehr@gmail.com; receiver= Received: from mail-qk1-f181.google.com (mail-qk1-f181.google.com [209.85.222.181]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 0A0EEBC530 for ; Tue, 3 Dec 2019 09:32:04 +0000 (UTC) Received: by mail-qk1-f181.google.com with SMTP id f5so2698438qkm.13 for ; Tue, 03 Dec 2019 01:32:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=B8WiIkhIdbJ1d0xCPj1prTNSeBJ2i6/gTbrsQk9gg+I=; b=dMDkI1uXLgyjHe8hTw8wQKdFtqiO1exujKKcdiaHez/cG1TQf5UWS6krNzZFQZqy77 9bcIOTG1QeEXQtoYtt1ElMLLqCz2GsiX3FKYwNs5r2BFkD72PRvgWww7Vu+pOAeM9/MF nq2DA5VN558cbCFaQqjXgzPHCe6BUI1aj0WRXS4HsjXNBGbdAroR7rSitTR8E8UDqvCH +kNMxoJOmQVOpOouy3OqfEIChbA7jlbGukiVgE4UU1npRWN8ag1yK6kH7rLo/+qYe/jc WKFW+jdc6M2Ucs1MQck0/zR0m9ngjqkh5Zm3LyHvwHWxo/IdlAxjyoLXf80VWdZR7ztz SkcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=B8WiIkhIdbJ1d0xCPj1prTNSeBJ2i6/gTbrsQk9gg+I=; b=D+ZFaWGaxlpVs+HtZaFRt3qaK2pFwScBkRRfjZLPlQ/yNzlvSW5Ufh8htwrwYOrkWZ +SyUfomJ3j2pEs4+pYa6I36YK9YyfdLjwMVJK1IolzZ4JHh4r9+s5bzSbnVDgiXdqSWK uw6r1GBhU9rKwQ0jngp1/7VgoItyqQgFRQUHiLOt0rqEl7jeG/ANvfTKpkBbTIf14FEo vdNkHdetJceLqQqHtX6hZ1ksBbB9R5pHxRJXs7dqewg7K2INJ4zsPeHZSSyT4Ubxn6U0 25D8z0nPBuP4mK2qjjeMvmuslxSoiJlhqH7qraZv5bjkxJiQVP0wDi9avogC+Ia+whYb MLgA== X-Gm-Message-State: APjAAAUDF8CWeQATiHz1wnb858qfdmfibkfOKDal0+nTWT/WdDowiDHj Aj0GX1nxdqgugaafGKU+m5KkCwM5syzplGcuwRYNf1AHNk8= X-Google-Smtp-Source: APXvYqxoeAEoMjrOCqJVizka58oionT/LGhff7QIIfsMAypt/OOX3+wFMQAhBXKvzTIdjluidUyhxMsxQ7R7gnRiVnk= X-Received: by 2002:a05:620a:1358:: with SMTP id c24mr4183574qkl.285.1575365517851; Tue, 03 Dec 2019 01:31:57 -0800 (PST) MIME-Version: 1.0 References: <6190fd1d-0abf-7a36-7172-983cf12a0195@strapdata.com> In-Reply-To: From: Hossein Ghiyasi Mehr Date: Tue, 3 Dec 2019 12:48:51 +0330 Message-ID: Subject: Re: Optimal backup strategy To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary="0000000000001541c70598c95e3a" --0000000000001541c70598c95e3a Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I am sorry! This is true. I forgot "*not*"! 1. It's *not* recommended to use commit log after one node failure. Cassandra has many options such as replication factor as substitute solution. *VafaTech.com - A Total Solution for Data Gathering & Analysis* On Tue, Dec 3, 2019 at 10:42 AM Adarsh Kumar wrote: > Thanks Hossein, > > Just one more question is there any special SOP or consideration we have > to take for multi-site backup. > > Please share any helpful link, blog or steps documented. > > Regards, > Adarsh Kumar > > On Sun, Dec 1, 2019 at 10:40 PM Hossein Ghiyasi Mehr < > ghiyasimehr@gmail.com> wrote: > >> 1. It's recommended to use commit log after one node failure. Cassandra >> has many options such as replication factor as substitute solution. >> 2. Yes, right. >> >> *VafaTech.com - A Total Solution for Data Gathering & Analysis* >> >> >> On Fri, Nov 29, 2019 at 9:33 AM Adarsh Kumar >> wrote: >> >>> Thanks Ahu and Hussein, >>> >>> So my understanding is: >>> >>> 1. Commit log backup is not documented for Apache Cassandra, hence >>> not standard. But can be used for restore on the same machine (For t= aking >>> backup from commit_log_dir). If used on other machine(s) has to be i= n the >>> same topology. Can it be used for replacement node? >>> 2. For periodic backup Snapshot+Incremental backup is the best optio= n >>> >>> >>> Thanks, >>> Adarsh Kumar >>> >>> On Fri, Nov 29, 2019 at 7:28 AM guo Maxwell >>> wrote: >>> >>>> Hossein is right , But for use , we restore to the same cassandra >>>> topology ,So it is usable to do replay .But when restore to the >>>> same machine it is also usable . >>>> Using sstableloader cost too much time and more storage(though will >>>> reduce after restored) >>>> >>>> Hossein Ghiyasi Mehr =E4=BA=8E2019=E5=B9=B411= =E6=9C=8828=E6=97=A5=E5=91=A8=E5=9B=9B =E4=B8=8B=E5=8D=887:40=E5=86=99=E9= =81=93=EF=BC=9A >>>> >>>>> commitlog backup isn't usable in another machine. >>>>> Backup solution depends on what you want to do: periodic backup or >>>>> backup to restore on other machine? >>>>> Periodic backup is combine of snapshot and incremental backup. Remove >>>>> incremental backup after new snapshot. >>>>> Take backup to restore on other machine: You can use snapshot after >>>>> flushing memtable or Use sstableloader. >>>>> >>>>> >>>>> ---- >>>>> VafaTech.com - A Total Solution for Data Gathering & Analysis >>>>> >>>>> On Thu, Nov 28, 2019 at 6:05 AM guo Maxwell >>>>> wrote: >>>>> >>>>>> for cassandra or datastax's documentation, commitlog's backup is not >>>>>> mentioned. >>>>>> only snapshot and incremental backup is described to do backup . >>>>>> >>>>>> Though commitlog's archive for keyspace/table is not support but >>>>>> commitlog' replay (though you must put log to commitlog_dir and rest= art the >>>>>> process) >>>>>> support the feature of keyspace/table' replay filter (using >>>>>> -Dcassandra.replayList with the keyspace1.table1,keyspace1.table2 fo= rmat to >>>>>> replay the specified keyspace/table) >>>>>> >>>>>> Snapshot do affect the storage, for us we got snapshot one week a >>>>>> time under the low business peak and making snapshot got throttle ,f= or you >>>>>> you may >>>>>> see the issue (https://issues.apache.org/jira/browse/CASSANDRA-13019= ) >>>>>> >>>>>> >>>>>> >>>>>> Adarsh Kumar =E4=BA=8E2019=E5=B9=B411=E6=9C= =8828=E6=97=A5=E5=91=A8=E5=9B=9B =E4=B8=8A=E5=8D=881:00=E5=86=99=E9=81=93= =EF=BC=9A >>>>>> >>>>>>> Thanks Guo and Eric for replying, >>>>>>> >>>>>>> I have some confusions about commit log backup: >>>>>>> >>>>>>> 1. commit log archival technique is ( >>>>>>> https://support.datastax.com/hc/en-us/articles/115001593706-Manu= al-Backup-and-Restore-with-Point-in-time-and-table-level-restore- >>>>>>> ) as good as an incremental backup, as it also captures commit l= ogs after >>>>>>> memtable flush. >>>>>>> 2. If we go for "Snapshot + Incremental bk + Commit log", here >>>>>>> we have to take commit log from commit log directory (is there a= ny SOP for >>>>>>> this?). As commit logs are not per table or ks, we will have cha= lange in >>>>>>> restoring selective tables. >>>>>>> 3. Snapshot based backups are easy to manage and operate due to >>>>>>> its simplicity. But they are heavy on storage. Any views on this= ? >>>>>>> 4. Please share any successful strategy that someone is using >>>>>>> for production. We are still in the design phase and want to imp= lement the >>>>>>> best solution. >>>>>>> >>>>>>> Thanks Eric for sharing link for medusa. >>>>>>> >>>>>>> Regards, >>>>>>> Adarsh Kumar >>>>>>> >>>>>>> On Wed, Nov 27, 2019 at 5:16 PM guo Maxwell >>>>>>> wrote: >>>>>>> >>>>>>>> For me, I think the last one : >>>>>>>> Snapshot + Incremental + commitlog >>>>>>>> is the most meaningful way to do backup and restore, when you make >>>>>>>> the data backup to some where else like AWS S3. >>>>>>>> >>>>>>>> - Snapshot based backup // for incremental data will not be >>>>>>>> backuped and may lose data when restore to the time latter than= snapshot >>>>>>>> time; >>>>>>>> - Incremental backups // better than snapshot backup .but >>>>>>>> with Insufficient data accuracy. For data remain in the memtabl= e will be >>>>>>>> lose; >>>>>>>> - Snapshot + incremental >>>>>>>> - Snapshot + commitlog archival // better data precision than >>>>>>>> made incremental backup, but the data in the non archived commi= tlog(not >>>>>>>> archive and commitlog log not closed) will not restore and will= lose. Also >>>>>>>> when log is too much, do log reply will cost very mucu time >>>>>>>> >>>>>>>> For me ,We use snapshot + incremental + commitlog archive. We read >>>>>>>> snapshot data and incremental data .Also the log is backuped .But = we will >>>>>>>> not backup the >>>>>>>> log whose data have been flush to sstable ,for the data will be >>>>>>>> backuped by the way we do incremental backup . >>>>>>>> >>>>>>>> This way , the data will exist in the format of sstable trough >>>>>>>> snapshot backup and incremental backup . The log number will be ve= ry small >>>>>>>> .And log replay will not cost much time. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Eric LELEU =E4=BA=8E2019=E5=B9=B411=E6=9C=882= 7=E6=97=A5=E5=91=A8=E4=B8=89 =E4=B8=8B=E5=8D=884:13=E5=86=99=E9=81=93=EF=BC= =9A >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> TheLastPickle & Spotify have released Medusa as Cassandra Backup >>>>>>>>> tool. >>>>>>>>> >>>>>>>>> See : >>>>>>>>> https://thelastpickle.com/blog/2019/11/05/cassandra-medusa-backup= -tool-is-open-source.html >>>>>>>>> >>>>>>>>> Hope this link will help you. >>>>>>>>> >>>>>>>>> Eric >>>>>>>>> >>>>>>>>> >>>>>>>>> Le 27/11/2019 =C3=A0 08:10, Adarsh Kumar a =C3=A9crit : >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I was looking for the backup strategies of Cassandra. After some >>>>>>>>> study I came to know that there are the following options: >>>>>>>>> >>>>>>>>> - Snapshot based backup >>>>>>>>> - Incremental backups >>>>>>>>> - Snapshot + incremental >>>>>>>>> - Snapshot + commitlog archival >>>>>>>>> - Snapshot + Incremental + commitlog >>>>>>>>> >>>>>>>>> Which is the most suitable and feasible approach? Also which of >>>>>>>>> these is used most. >>>>>>>>> Please let me know if there is any other option to tool available= . >>>>>>>>> >>>>>>>>> Thanks in advance. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Adarsh Kumar >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> you are the apple of my eye ! >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> you are the apple of my eye ! >>>>>> >>>>> >>>> >>>> -- >>>> you are the apple of my eye ! >>>> >>> --0000000000001541c70598c95e3a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I am sorry! This is true. I forgot "not"!
1. It's not recommended to use commit l= og after one node failure. Cassandra has many options such as replication f= actor as substitute=C2=A0solution.=C2=A0=C2=A0

<= div dir=3D"ltr" class=3D"gmail_signature" data-smartmail=3D"gmail_signature= ">
VafaTech.com - A T= otal Solution for Data Gathering & Analysis
<= /div>

On Tue, Dec 3, 2019 at 10:42 AM Adarsh Kumar <adarsh0007@gmail.com> wrote:
=
Thanks H= ossein,

Just one more question is there any special = SOP or consideration we have to take for multi-site backup.

<= /div>
Please share any helpful=C2=A0link, blog or steps documented.

Regards,
Adarsh Kumar

On Sun, Dec 1, 2= 019 at 10:40 PM Hossein Ghiyasi Mehr <ghiyasimehr@gmail.com> wrote:
1. It's = recommended to use commit log after one node failure. Cassandra has many op= tions such as replication factor as substitute=C2=A0solution.
2. Yes, r= ight.

VafaTech.com - A Total Solution for Data = Gathering & Analysis


On= Fri, Nov 29, 2019 at 9:33 AM Adarsh Kumar <adarsh0007@gmail.com> wrote:
=
Thanks A= hu and Hussein,

So my understanding is:
    Commit log backup is not documented for Apache Cassandra, hence not stand= ard. But can be used for restore on the same machine (For taking backup fro= m commit_log_dir). If used on other machine(s) has to be in the same topolo= gy. Can it be used for replacement node?=C2=A0
  1. For periodic backup = Snapshot+Incremental backup is the best option

Thanks,
Adarsh Kumar

On Fri, Nov 29, 2019 at 7:28 AM = guo Maxwell <c= clive1601@gmail.com> wrote:
Hossein is right , But for use , we res= tore to the same cassandra topology ,So it is usable to do replay .But when= restore to the=C2=A0
same machine it is also usable .
Using = sstableloader cost too much time and more storage(though will reduce after= =C2=A0 restored)

Hossein Ghiyasi Mehr <ghiyasimehr@gmail.com> =E4=BA=8E2019= =E5=B9=B411=E6=9C=8828=E6=97=A5=E5=91=A8=E5=9B=9B =E4=B8=8B=E5=8D=887:40=E5= =86=99=E9=81=93=EF=BC=9A
commitlog backup isn't usable in anoth= er machine.
Backup solution depends on what you want to do: perio= dic backup or backup to restore on other machine?
Periodic=C2=A0b= ackup is combine of snapshot and incremental backup. Remove incremental bac= kup after new snapshot.
Take backup to restore=C2=A0on other mach= ine: You can use snapshot after flushing memtable or Use sstableloader.


----
VafaTech.com - A Total Solution fo= r Data Gathering & Analysis

<= div dir=3D"ltr" class=3D"gmail_attr">On Thu, Nov 28, 2019 at 6:05 AM guo Ma= xwell <cclive1= 601@gmail.com> wrote:
for cassandra or datastax's documentation= , commitlog's backup is not mentioned.=C2=A0
only snapshot and incr= emental backup is described to do backup .

Though = commitlog's archive for keyspace/table is not support but commitlog'= ; replay (though you must put log to commitlog_dir and restart the process)=
support the feature of keyspace/table' replay filter (using = -Dcassandra.replayList with the keyspace1.table1,keyspace1.table2 format to= replay the specified keyspace/table)

Snapshot do = affect the storage, for us we got snapshot one week a time under the low bu= siness peak and making snapshot got throttle ,for you you may=C2=A0
see the issue (https://issues.apache.org/jira/browse/CASSANDRA-= 13019)



= Adarsh Kumar <= adarsh0007@gmail.com> =E4=BA=8E2019=E5=B9=B411=E6=9C=8828=E6=97=A5= =E5=91=A8=E5=9B=9B =E4=B8=8A=E5=8D=881:00=E5=86=99=E9=81=93=EF=BC=9A
Thank= s Guo and Eric for replying,

I have some confusions abou= t=C2=A0commit log backup:
  1. commit log archival technique i= s ( https://support.datastax.com/hc/en-us/articles/115001593706-Man= ual-Backup-and-Restore-with-Point-in-time-and-table-level-restore-=C2= =A0 ) as good as an incremental backup, as it also captures commit logs aft= er memtable flush.
  2. If we go for "Snapshot + Incremental bk + C= ommit log", here we have to take commit log from commit log directory = (is there any SOP for this?). As commit logs are not per table or ks, we wi= ll have chalange in restoring selective tables.
  3. Snapshot based back= ups are easy to manage and operate due to its simplicity. But they are heav= y on storage. Any views on this?
  4. Please share any successful strate= gy that someone is using for production. We are still in the design phase a= nd want to implement=C2=A0the best solution.
Thanks Eric for = sharing link for medusa.

Regards,
= Adarsh Kumar

On Wed, Nov 27, 2019 at 5:16 PM guo Maxwell <cclive1601@gmail.com= > wrote:
For me, I think the last one :=C2=A0
=C2=A0Snapshot=C2=A0+= Incremental=C2=A0+ commitlog=C2=A0
is the most=C2=A0meaningf= ul way to do backup and restore, when you make the data backup to some wher= e else like AWS S3.
  • Snapshot b= ased backup // for incremental data will not be backuped and may lose data = when restore to the time latter than snapshot time;
  • Incremental backups // better than snapshot backup .but with=C2= =A0Insufficient data accuracy. For data remain in the memtable will be lose= ;
  • Snapshot=C2=A0+ incremental=C2=A0
  • =
  • Snapshot=C2=A0+ commitlog archival // better= data precision than made incremental backup, but the data in the non archi= ved commitlog(not archive and commitlog log not closed) will not restore an= d will lose. Also when log is too much, do log reply will cost very mucu ti= me
For me ,We use snapshot=C2=A0+ incremental=C2=A0+ commitlo= g archive. We read snapshot data and incremental data .Also the log is back= uped .But we will not backup the=C2=A0
log whose=C2=A0data have b= een flush to sstable ,for the data will be backuped by the way we do increm= ental backup .

This way , the data will exist in t= he format of sstable trough snapshot backup and incremental backup . The lo= g number will be very small .And log replay will not cost much time.
<= div>


Eric LELEU <eric@strapdata.com> =E4=BA=8E2019=E5= =B9=B411=E6=9C=8827=E6=97=A5=E5=91=A8=E4=B8=89 =E4=B8=8B=E5=8D=884:13=E5=86= =99=E9=81=93=EF=BC=9A
=20 =20 =20

Hi,

TheLastPickle & Spotify have released Medusa as Cassandra Backup tool.

See : https://thelastpickle.com/blo= g/2019/11/05/cassandra-medusa-backup-tool-is-open-source.html

Hope this link will help you.

Eric


Le 27/11/2019 =C3=A0 08:10, Adarsh Kumar a =C3=A9crit=C2=A0:
=20
Hi,

I was looking for the backup strategies of Cassandra. After some study I came to know that there are the following options:
  • Snapshot based backup
  • Incremental backups
  • Snapshot=C2=A0+ incremental
  • Snapshot=C2=A0+ commitlog archival
  • Snapshot=C2=A0+ Incremental=C2=A0+ commitlog
Which is the most suitable=C2=A0and feasible=C2=A0approach? = Also which of these is used most.
Please let me know if there=C2=A0is any other option to tool available.

Thanks in advance.

Regards,
Adarsh Kumar


--
you are the apple of my eye !


--
you are the apple of my eye !


--
you are the apple of my eye !
--0000000000001541c70598c95e3a--