From user-return-64791-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Fri Nov 29 06:03:31 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 0B13B180657 for ; Fri, 29 Nov 2019 07:03:30 +0100 (CET) Received: (qmail 55276 invoked by uid 500); 29 Nov 2019 06:03:26 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 55266 invoked by uid 99); 29 Nov 2019 06:03:26 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Nov 2019 06:03:26 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 0B465C2270 for ; Fri, 29 Nov 2019 06:03:26 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.251 X-Spam-Level: X-Spam-Status: No, score=0.251 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=0.2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 5VnPSTYrfCTI for ; Fri, 29 Nov 2019 06:03:23 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2607:f8b0:4864:20::92f; helo=mail-ua1-x92f.google.com; envelope-from=adarsh0007@gmail.com; receiver= Received: from mail-ua1-x92f.google.com (mail-ua1-x92f.google.com [IPv6:2607:f8b0:4864:20::92f]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id 146997DDAC for ; Fri, 29 Nov 2019 06:03:23 +0000 (UTC) Received: by mail-ua1-x92f.google.com with SMTP id x15so357871uar.3 for ; Thu, 28 Nov 2019 22:03:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=SbgI1i6u+VK1eXmL8j/ha8PdSz2aRYEh+sITTidghT8=; b=JTCDk4GfAOXANdIuET6PlYeSCW693XVYliW+L4KqKmLd9iiuywOueV0mBifDVFQiYn L74/SdPI6DQgmHZCFOgH6bqelxk7adiqGTAx+lAtciIcPUkT9GtG8WohsI9/VY+bpNDb j/sO9O7d+4cmVEVvkc8KfiOh0b4FYj8tv/bYEpF9n6pI2v2ktgSCXN/rXf3tkb6HKsmr YvgIeZCju/ZaRkFFsi9TN/mCHEOBEiiIkJ6E31pdePUVf46QwNOqvrEU2wMBw6Xwpvqo 8CgsEj8tg2JCWC2j8P0xAQ/rUnlBgr+MW2nNqn7NB6jYz/2FzSqvh/yfoPjgdKK3HASh Y3RQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=SbgI1i6u+VK1eXmL8j/ha8PdSz2aRYEh+sITTidghT8=; b=YbUw0/xEbsx9A5Sdul1paiGzsPkQG8W5/GKZYJQKoxPOl4ksixunUBryTKnKQnhHLw YlfhPVr3hsJdvO1USmop8nJBBYXf0bAjM42ICJ9nTTyuA3ltEX6qUrMJ12YEntHJ4xb9 6GpjXUoxX1lnwOR8QTVPdkGfJSAYxLjM4tYqVpHKBRFaaSDiERnGqOZ3AGKPetROH4es t55tjLZ3H4LWUCzxtpY/ODWOCpETk2+11F6nOD+vPs5Xgn8e/yKUZUGTykR5mmZT4wDS 8/n0hO5be7c7z5K5NSw1NB8h1PgQYwB/p3kdoTrXir0weAfutWlc8AEvP3dlW1iB3iUM w6AQ== X-Gm-Message-State: APjAAAXmv3yA3FJVqCOPf6vXkofRj2bfhfEH/5A0dZG7gN7bMr9Hi89U efXqizgCO0a5330gJCpOQYXsMbzrjrE9himQKABvyQ== X-Google-Smtp-Source: APXvYqyCB6XJygnr/ikoMx7BPCc54RAJ96zHqs4O9Hp2lNQIaXjZ0fN7Oo9J9Y9OPvBkGlJUBw6Kbivyj3LUMk0bgg8= X-Received: by 2002:ab0:344b:: with SMTP id a11mr7802206uaq.66.1575007395805; Thu, 28 Nov 2019 22:03:15 -0800 (PST) MIME-Version: 1.0 References: <6190fd1d-0abf-7a36-7172-983cf12a0195@strapdata.com> In-Reply-To: From: Adarsh Kumar Date: Fri, 29 Nov 2019 11:32:39 +0530 Message-ID: Subject: Re: Optimal backup strategy To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary="000000000000587fbb059875fcff" --000000000000587fbb059875fcff Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks Ahu and Hussein, So my understanding is: 1. Commit log backup is not documented for Apache Cassandra, hence not standard. But can be used for restore on the same machine (For taking backup from commit_log_dir). If used on other machine(s) has to be in th= e same topology. Can it be used for replacement node? 2. For periodic backup Snapshot+Incremental backup is the best option Thanks, Adarsh Kumar On Fri, Nov 29, 2019 at 7:28 AM guo Maxwell wrote: > Hossein is right , But for use , we restore to the same cassandra topolog= y > ,So it is usable to do replay .But when restore to the > same machine it is also usable . > Using sstableloader cost too much time and more storage(though will reduc= e > after restored) > > Hossein Ghiyasi Mehr =E4=BA=8E2019=E5=B9=B411=E6= =9C=8828=E6=97=A5=E5=91=A8=E5=9B=9B =E4=B8=8B=E5=8D=887:40=E5=86=99=E9=81= =93=EF=BC=9A > >> commitlog backup isn't usable in another machine. >> Backup solution depends on what you want to do: periodic backup or backu= p >> to restore on other machine? >> Periodic backup is combine of snapshot and incremental backup. Remove >> incremental backup after new snapshot. >> Take backup to restore on other machine: You can use snapshot after >> flushing memtable or Use sstableloader. >> >> >> ---- >> VafaTech.com - A Total Solution for Data Gathering & Analysis >> >> On Thu, Nov 28, 2019 at 6:05 AM guo Maxwell wrote= : >> >>> for cassandra or datastax's documentation, commitlog's backup is not >>> mentioned. >>> only snapshot and incremental backup is described to do backup . >>> >>> Though commitlog's archive for keyspace/table is not support but >>> commitlog' replay (though you must put log to commitlog_dir and restart= the >>> process) >>> support the feature of keyspace/table' replay filter (using >>> -Dcassandra.replayList with the keyspace1.table1,keyspace1.table2 forma= t to >>> replay the specified keyspace/table) >>> >>> Snapshot do affect the storage, for us we got snapshot one week a time >>> under the low business peak and making snapshot got throttle ,for you y= ou >>> may >>> see the issue (https://issues.apache.org/jira/browse/CASSANDRA-13019) >>> >>> >>> >>> Adarsh Kumar =E4=BA=8E2019=E5=B9=B411=E6=9C=8828= =E6=97=A5=E5=91=A8=E5=9B=9B =E4=B8=8A=E5=8D=881:00=E5=86=99=E9=81=93=EF=BC= =9A >>> >>>> Thanks Guo and Eric for replying, >>>> >>>> I have some confusions about commit log backup: >>>> >>>> 1. commit log archival technique is ( >>>> https://support.datastax.com/hc/en-us/articles/115001593706-Manual-= Backup-and-Restore-with-Point-in-time-and-table-level-restore- >>>> ) as good as an incremental backup, as it also captures commit logs= after >>>> memtable flush. >>>> 2. If we go for "Snapshot + Incremental bk + Commit log", here we >>>> have to take commit log from commit log directory (is there any SOP= for >>>> this?). As commit logs are not per table or ks, we will have chalan= ge in >>>> restoring selective tables. >>>> 3. Snapshot based backups are easy to manage and operate due to its >>>> simplicity. But they are heavy on storage. Any views on this? >>>> 4. Please share any successful strategy that someone is using for >>>> production. We are still in the design phase and want to implement = the best >>>> solution. >>>> >>>> Thanks Eric for sharing link for medusa. >>>> >>>> Regards, >>>> Adarsh Kumar >>>> >>>> On Wed, Nov 27, 2019 at 5:16 PM guo Maxwell >>>> wrote: >>>> >>>>> For me, I think the last one : >>>>> Snapshot + Incremental + commitlog >>>>> is the most meaningful way to do backup and restore, when you make th= e >>>>> data backup to some where else like AWS S3. >>>>> >>>>> - Snapshot based backup // for incremental data will not be >>>>> backuped and may lose data when restore to the time latter than sn= apshot >>>>> time; >>>>> - Incremental backups // better than snapshot backup .but >>>>> with Insufficient data accuracy. For data remain in the memtable w= ill be >>>>> lose; >>>>> - Snapshot + incremental >>>>> - Snapshot + commitlog archival // better data precision than made >>>>> incremental backup, but the data in the non archived commitlog(not= archive >>>>> and commitlog log not closed) will not restore and will lose. Also= when log >>>>> is too much, do log reply will cost very mucu time >>>>> >>>>> For me ,We use snapshot + incremental + commitlog archive. We read >>>>> snapshot data and incremental data .Also the log is backuped .But we = will >>>>> not backup the >>>>> log whose data have been flush to sstable ,for the data will be >>>>> backuped by the way we do incremental backup . >>>>> >>>>> This way , the data will exist in the format of sstable trough >>>>> snapshot backup and incremental backup . The log number will be very = small >>>>> .And log replay will not cost much time. >>>>> >>>>> >>>>> >>>>> Eric LELEU =E4=BA=8E2019=E5=B9=B411=E6=9C=8827= =E6=97=A5=E5=91=A8=E4=B8=89 =E4=B8=8B=E5=8D=884:13=E5=86=99=E9=81=93=EF=BC= =9A >>>>> >>>>>> Hi, >>>>>> TheLastPickle & Spotify have released Medusa as Cassandra Backup too= l. >>>>>> >>>>>> See : >>>>>> https://thelastpickle.com/blog/2019/11/05/cassandra-medusa-backup-to= ol-is-open-source.html >>>>>> >>>>>> Hope this link will help you. >>>>>> >>>>>> Eric >>>>>> >>>>>> >>>>>> Le 27/11/2019 =C3=A0 08:10, Adarsh Kumar a =C3=A9crit : >>>>>> >>>>>> Hi, >>>>>> >>>>>> I was looking for the backup strategies of Cassandra. After some >>>>>> study I came to know that there are the following options: >>>>>> >>>>>> - Snapshot based backup >>>>>> - Incremental backups >>>>>> - Snapshot + incremental >>>>>> - Snapshot + commitlog archival >>>>>> - Snapshot + Incremental + commitlog >>>>>> >>>>>> Which is the most suitable and feasible approach? Also which of thes= e >>>>>> is used most. >>>>>> Please let me know if there is any other option to tool available. >>>>>> >>>>>> Thanks in advance. >>>>>> >>>>>> Regards, >>>>>> Adarsh Kumar >>>>>> >>>>>> >>>>> >>>>> -- >>>>> you are the apple of my eye ! >>>>> >>>> >>> >>> -- >>> you are the apple of my eye ! >>> >> > > -- > you are the apple of my eye ! > --000000000000587fbb059875fcff Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks Ahu and Hussein,

So my understan= ding is:
  1. Commit log backup is not documented for Apache C= assandra, hence not standard. But can be used for restore on the same machi= ne (For taking backup from commit_log_dir). If used on other machine(s) has= to be in the same topology. Can it be used for replacement node?=C2=A0
  2. For periodic backup Snapshot+Incremental backup is the best option

Thanks,
Adarsh Kumar
<= br>
On Fri,= Nov 29, 2019 at 7:28 AM guo Maxwell <cclive1601@gmail.com> wrote:
Hossein is right , But for use ,= we restore to the same cassandra topology ,So it is usable to do replay .B= ut when restore to the=C2=A0
same machine it is also usable .
Using sstableloader cost too much time and more storage(though will reduce= after=C2=A0 restored)

Hossein Ghiyasi Mehr <ghiyasimehr@gmail.com> =E4=BA= =8E2019=E5=B9=B411=E6=9C=8828=E6=97=A5=E5=91=A8=E5=9B=9B =E4=B8=8B=E5=8D=88= 7:40=E5=86=99=E9=81=93=EF=BC=9A
commitlog backup isn't usable in = another machine.
Backup solution depends on what you want to do: = periodic backup or backup to restore on other machine?
Periodic= =C2=A0backup is combine of snapshot and incremental backup. Remove incremen= tal backup after new snapshot.
Take backup to restore=C2=A0on oth= er machine: You can use snapshot after flushing memtable or Use sstableload= er.


----
VafaTech.com - A Total Solu= tion for Data Gathering & Analysis

On Thu, Nov 28, 2019 at 6:05 AM= guo Maxwell <= cclive1601@gmail.com> wrote:
for cassandra or datastax's docume= ntation, commitlog's backup is not mentioned.=C2=A0
only snapshot a= nd incremental backup is described to do backup .

= Though commitlog's archive for keyspace/table is not support but commit= log' replay (though you must put log to commitlog_dir and restart the p= rocess)
support the feature of keyspace/table' replay filter = (using -Dcassandra.replayList with the keyspace1.table1,keyspace1.table2 fo= rmat to replay the specified keyspace/table)

Snaps= hot do affect the storage, for us we got snapshot one week a time under the= low business peak and making snapshot got throttle ,for you you may=C2=A0<= /div>



= Adarsh Kumar <= adarsh0007@gmail.com> =E4=BA=8E2019=E5=B9=B411=E6=9C=8828=E6=97=A5= =E5=91=A8=E5=9B=9B =E4=B8=8A=E5=8D=881:00=E5=86=99=E9=81=93=EF=BC=9A
Thank= s Guo and Eric for replying,

I have some confusions abou= t=C2=A0commit log backup:
  1. commit log archival technique i= s ( https://support.datastax.com/hc/en-us/articles/115001593706-Man= ual-Backup-and-Restore-with-Point-in-time-and-table-level-restore-=C2= =A0 ) as good as an incremental backup, as it also captures commit logs aft= er memtable flush.
  2. If we go for "Snapshot + Incremental bk + C= ommit log", here we have to take commit log from commit log directory = (is there any SOP for this?). As commit logs are not per table or ks, we wi= ll have chalange in restoring selective tables.
  3. Snapshot based back= ups are easy to manage and operate due to its simplicity. But they are heav= y on storage. Any views on this?
  4. Please share any successful strate= gy that someone is using for production. We are still in the design phase a= nd want to implement=C2=A0the best solution.
Thanks Eric for = sharing link for medusa.

Regards,
= Adarsh Kumar

On Wed, Nov 27, 2019 at 5:16 PM guo Maxwell <cclive1601@gmail.com= > wrote:
For me, I think the last one :=C2=A0
=C2=A0Snapshot=C2=A0+= Incremental=C2=A0+ commitlog=C2=A0
is the most=C2=A0meaningf= ul way to do backup and restore, when you make the data backup to some wher= e else like AWS S3.
  • Snapshot b= ased backup // for incremental data will not be backuped and may lose data = when restore to the time latter than snapshot time;
  • Incremental backups // better than snapshot backup .but with=C2= =A0Insufficient data accuracy. For data remain in the memtable will be lose= ;
  • Snapshot=C2=A0+ incremental=C2=A0
  • =
  • Snapshot=C2=A0+ commitlog archival // better= data precision than made incremental backup, but the data in the non archi= ved commitlog(not archive and commitlog log not closed) will not restore an= d will lose. Also when log is too much, do log reply will cost very mucu ti= me
For me ,We use snapshot=C2=A0+ incremental=C2=A0+ commitlo= g archive. We read snapshot data and incremental data .Also the log is back= uped .But we will not backup the=C2=A0
log whose=C2=A0data have b= een flush to sstable ,for the data will be backuped by the way we do increm= ental backup .

This way , the data will exist in t= he format of sstable trough snapshot backup and incremental backup . The lo= g number will be very small .And log replay will not cost much time.
<= div>


Eric LELEU <eric@strapdata.com> =E4=BA=8E2019=E5= =B9=B411=E6=9C=8827=E6=97=A5=E5=91=A8=E4=B8=89 =E4=B8=8B=E5=8D=884:13=E5=86= =99=E9=81=93=EF=BC=9A
=20 =20 =20

Hi,

TheLastPickle & Spotify have released Medusa as Cassandra Backup tool.

See : https://thelastpickle.com/blo= g/2019/11/05/cassandra-medusa-backup-tool-is-open-source.html

Hope this link will help you.

Eric


Le 27/11/2019 =C3=A0 08:10, Adarsh Kumar a =C3=A9crit=C2=A0:
=20
Hi,

I was looking for the backup strategies of Cassandra. After some study I came to know that there are the following options:
  • Snapshot based backup
  • Incremental backups
  • Snapshot=C2=A0+ incremental
  • Snapshot=C2=A0+ commitlog archival
  • Snapshot=C2=A0+ Incremental=C2=A0+ commitlog
Which is the most suitable=C2=A0and feasible=C2=A0approach? = Also which of these is used most.
Please let me know if there=C2=A0is any other option to tool available.

Thanks in advance.

Regards,
Adarsh Kumar


--
you are the apple of my eye !


--
you are the apple of my eye !


--
you are the apple of my eye !
--000000000000587fbb059875fcff--