From user-return-64804-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Sun Dec 1 17:10:40 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 33CFF18061A for ; Sun, 1 Dec 2019 18:10:40 +0100 (CET) Received: (qmail 83139 invoked by uid 500); 1 Dec 2019 17:10:36 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 83129 invoked by uid 99); 1 Dec 2019 17:10:36 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 01 Dec 2019 17:10:36 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 35175180F3F for ; Sun, 1 Dec 2019 17:10:36 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0 X-Spam-Level: X-Spam-Status: No, score=0 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id ynOl3EPbWn-z for ; Sun, 1 Dec 2019 17:10:34 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2607:f8b0:4864:20::730; helo=mail-qk1-x730.google.com; envelope-from=ghiyasimehr@gmail.com; receiver= Received: from mail-qk1-x730.google.com (mail-qk1-x730.google.com [IPv6:2607:f8b0:4864:20::730]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id C6E987DDCD for ; Sun, 1 Dec 2019 17:10:33 +0000 (UTC) Received: by mail-qk1-x730.google.com with SMTP id v23so22431344qkg.2 for ; Sun, 01 Dec 2019 09:10:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=mULsg6hwlV7M+fOJuDsmb0CqY5DNs6tnJ3/r4XsgfVM=; b=Vt7tEsD3hwgu/A3ojX3LFRwmeGW+IKRboFXXE6VWyjmEA8TVRD8fPTldHnhrwGjGNu l7gtnax2TUg/j+TA782FAId9EywCexkiOoPYfqRUf/zRfm888MKe86L0DSW2Tlgrc6uS 6I1C+vEKrJ/gJcAlKqnZZGGJuYmED3n7FZ1scFHXP4YstEhLXhsQd/sYQ3ZhG0M8U6J0 sKZegiL4qKpTx/f0pbi88nRTbXgtYYei5ZThvetX75gMx2XXZcjpcTIrZRW6ig3VeF6Q 86RcKLZob0i0VbWABx9BCFhL7fMBZd7tgZlHzlKi5XA78fqllHs21UpKWcamwondSn5D lR+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=mULsg6hwlV7M+fOJuDsmb0CqY5DNs6tnJ3/r4XsgfVM=; b=Tz3mLvcYQzwfD4DsiaqiIAhT1dTcP4WFgg1jjTz7anr2KQYx2sKX3SjTLkJkCkFnni yjBLDaUBt78IkcT7N8FjjemGIL9zrEZI918Wc7CmXz5CwCOtUlKmEDyHQOqXYVsYO9pa PUM4nfV7ydfntO824NZuyjEISbJFniW2Yj7dDOA2wsNpZqVHtL7UOjqourOZMyY985At n9tEC8cx7YiOrj5FKV2mFWSfguTKxXJG8d5D0G7JTKc5CPBuCQQ+mizCl1x3F8bidXAW YXUor7Zt2cbX0iRMv35bHu89nNqsiBnSJ9XrPbiiXwRh3fX75s8mOvxE/mNjZElhP4b1 uRUw== X-Gm-Message-State: APjAAAUXFXlMpU34xZoY6Oj3+YvAHZV/0tE2YfNNxBnegq+xxJnjHJta FPnn/8isqYrAhAXT6i2fb/yc+DCft0J4hyg27jhkUcSJ X-Google-Smtp-Source: APXvYqybvUS7vPoCBFeqDXrZNwPBznSfbeSzNZoTAcq5nsYyPhIPHfdC/dxqeq4AOWXJYACVjoIvO+CEYaXZ9OYhf1M= X-Received: by 2002:a37:693:: with SMTP id 141mr13199069qkg.134.1575220232380; Sun, 01 Dec 2019 09:10:32 -0800 (PST) MIME-Version: 1.0 References: <6190fd1d-0abf-7a36-7172-983cf12a0195@strapdata.com> In-Reply-To: From: Hossein Ghiyasi Mehr Date: Sun, 1 Dec 2019 20:27:30 +0330 Message-ID: Subject: Re: Optimal backup strategy To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary="00000000000064e6840598a78a67" --00000000000064e6840598a78a67 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable 1. It's recommended to use commit log after one node failure. Cassandra has many options such as replication factor as substitute solution. 2. Yes, right. *VafaTech.com - A Total Solution for Data Gathering & Analysis* On Fri, Nov 29, 2019 at 9:33 AM Adarsh Kumar wrote: > Thanks Ahu and Hussein, > > So my understanding is: > > 1. Commit log backup is not documented for Apache Cassandra, hence not > standard. But can be used for restore on the same machine (For taking > backup from commit_log_dir). If used on other machine(s) has to be in = the > same topology. Can it be used for replacement node? > 2. For periodic backup Snapshot+Incremental backup is the best option > > > Thanks, > Adarsh Kumar > > On Fri, Nov 29, 2019 at 7:28 AM guo Maxwell wrote: > >> Hossein is right , But for use , we restore to the same cassandra >> topology ,So it is usable to do replay .But when restore to the >> same machine it is also usable . >> Using sstableloader cost too much time and more storage(though will >> reduce after restored) >> >> Hossein Ghiyasi Mehr =E4=BA=8E2019=E5=B9=B411=E6= =9C=8828=E6=97=A5=E5=91=A8=E5=9B=9B =E4=B8=8B=E5=8D=887:40=E5=86=99=E9=81= =93=EF=BC=9A >> >>> commitlog backup isn't usable in another machine. >>> Backup solution depends on what you want to do: periodic backup or >>> backup to restore on other machine? >>> Periodic backup is combine of snapshot and incremental backup. Remove >>> incremental backup after new snapshot. >>> Take backup to restore on other machine: You can use snapshot after >>> flushing memtable or Use sstableloader. >>> >>> >>> ---- >>> VafaTech.com - A Total Solution for Data Gathering & Analysis >>> >>> On Thu, Nov 28, 2019 at 6:05 AM guo Maxwell >>> wrote: >>> >>>> for cassandra or datastax's documentation, commitlog's backup is not >>>> mentioned. >>>> only snapshot and incremental backup is described to do backup . >>>> >>>> Though commitlog's archive for keyspace/table is not support but >>>> commitlog' replay (though you must put log to commitlog_dir and restar= t the >>>> process) >>>> support the feature of keyspace/table' replay filter (using >>>> -Dcassandra.replayList with the keyspace1.table1,keyspace1.table2 form= at to >>>> replay the specified keyspace/table) >>>> >>>> Snapshot do affect the storage, for us we got snapshot one week a time >>>> under the low business peak and making snapshot got throttle ,for you = you >>>> may >>>> see the issue (https://issues.apache.org/jira/browse/CASSANDRA-13019) >>>> >>>> >>>> >>>> Adarsh Kumar =E4=BA=8E2019=E5=B9=B411=E6=9C=882= 8=E6=97=A5=E5=91=A8=E5=9B=9B =E4=B8=8A=E5=8D=881:00=E5=86=99=E9=81=93=EF=BC= =9A >>>> >>>>> Thanks Guo and Eric for replying, >>>>> >>>>> I have some confusions about commit log backup: >>>>> >>>>> 1. commit log archival technique is ( >>>>> https://support.datastax.com/hc/en-us/articles/115001593706-Manual= -Backup-and-Restore-with-Point-in-time-and-table-level-restore- >>>>> ) as good as an incremental backup, as it also captures commit log= s after >>>>> memtable flush. >>>>> 2. If we go for "Snapshot + Incremental bk + Commit log", here we >>>>> have to take commit log from commit log directory (is there any SO= P for >>>>> this?). As commit logs are not per table or ks, we will have chala= nge in >>>>> restoring selective tables. >>>>> 3. Snapshot based backups are easy to manage and operate due to >>>>> its simplicity. But they are heavy on storage. Any views on this? >>>>> 4. Please share any successful strategy that someone is using for >>>>> production. We are still in the design phase and want to implement= the best >>>>> solution. >>>>> >>>>> Thanks Eric for sharing link for medusa. >>>>> >>>>> Regards, >>>>> Adarsh Kumar >>>>> >>>>> On Wed, Nov 27, 2019 at 5:16 PM guo Maxwell >>>>> wrote: >>>>> >>>>>> For me, I think the last one : >>>>>> Snapshot + Incremental + commitlog >>>>>> is the most meaningful way to do backup and restore, when you make >>>>>> the data backup to some where else like AWS S3. >>>>>> >>>>>> - Snapshot based backup // for incremental data will not be >>>>>> backuped and may lose data when restore to the time latter than s= napshot >>>>>> time; >>>>>> - Incremental backups // better than snapshot backup .but >>>>>> with Insufficient data accuracy. For data remain in the memtable = will be >>>>>> lose; >>>>>> - Snapshot + incremental >>>>>> - Snapshot + commitlog archival // better data precision than >>>>>> made incremental backup, but the data in the non archived commitl= og(not >>>>>> archive and commitlog log not closed) will not restore and will l= ose. Also >>>>>> when log is too much, do log reply will cost very mucu time >>>>>> >>>>>> For me ,We use snapshot + incremental + commitlog archive. We read >>>>>> snapshot data and incremental data .Also the log is backuped .But we= will >>>>>> not backup the >>>>>> log whose data have been flush to sstable ,for the data will be >>>>>> backuped by the way we do incremental backup . >>>>>> >>>>>> This way , the data will exist in the format of sstable trough >>>>>> snapshot backup and incremental backup . The log number will be very= small >>>>>> .And log replay will not cost much time. >>>>>> >>>>>> >>>>>> >>>>>> Eric LELEU =E4=BA=8E2019=E5=B9=B411=E6=9C=8827= =E6=97=A5=E5=91=A8=E4=B8=89 =E4=B8=8B=E5=8D=884:13=E5=86=99=E9=81=93=EF=BC= =9A >>>>>> >>>>>>> Hi, >>>>>>> TheLastPickle & Spotify have released Medusa as Cassandra Backup >>>>>>> tool. >>>>>>> >>>>>>> See : >>>>>>> https://thelastpickle.com/blog/2019/11/05/cassandra-medusa-backup-t= ool-is-open-source.html >>>>>>> >>>>>>> Hope this link will help you. >>>>>>> >>>>>>> Eric >>>>>>> >>>>>>> >>>>>>> Le 27/11/2019 =C3=A0 08:10, Adarsh Kumar a =C3=A9crit : >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I was looking for the backup strategies of Cassandra. After some >>>>>>> study I came to know that there are the following options: >>>>>>> >>>>>>> - Snapshot based backup >>>>>>> - Incremental backups >>>>>>> - Snapshot + incremental >>>>>>> - Snapshot + commitlog archival >>>>>>> - Snapshot + Incremental + commitlog >>>>>>> >>>>>>> Which is the most suitable and feasible approach? Also which of >>>>>>> these is used most. >>>>>>> Please let me know if there is any other option to tool available. >>>>>>> >>>>>>> Thanks in advance. >>>>>>> >>>>>>> Regards, >>>>>>> Adarsh Kumar >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> you are the apple of my eye ! >>>>>> >>>>> >>>> >>>> -- >>>> you are the apple of my eye ! >>>> >>> >> >> -- >> you are the apple of my eye ! >> > --00000000000064e6840598a78a67 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
1. It's recommended to use commit log after one node f= ailure. Cassandra has many options such as replication factor as substitute= =C2=A0solution.
2. Yes, right.

VafaTech.com - A Total So= lution for Data Gathering & Analysis

On Fri, Nov 29, 2019 at 9:33 AM Adarsh Kumar <adarsh0007@gmail.com> wrote:
=
Thanks A= hu and Hussein,

So my understanding is:
    Commit log backup is not documented for Apache Cassandra, hence not stand= ard. But can be used for restore on the same machine (For taking backup fro= m commit_log_dir). If used on other machine(s) has to be in the same topolo= gy. Can it be used for replacement node?=C2=A0
  1. For periodic backup = Snapshot+Incremental backup is the best option

Thanks,
Adarsh Kumar

On Fri, Nov 29, 2019 at 7:28 AM = guo Maxwell <c= clive1601@gmail.com> wrote:
Hossein is right , But for use , we res= tore to the same cassandra topology ,So it is usable to do replay .But when= restore to the=C2=A0
same machine it is also usable .
Using = sstableloader cost too much time and more storage(though will reduce after= =C2=A0 restored)

Hossein Ghiyasi Mehr <ghiyasimehr@gmail.com> =E4=BA=8E2019= =E5=B9=B411=E6=9C=8828=E6=97=A5=E5=91=A8=E5=9B=9B =E4=B8=8B=E5=8D=887:40=E5= =86=99=E9=81=93=EF=BC=9A
commitlog backup isn't usable in anoth= er machine.
Backup solution depends on what you want to do: perio= dic backup or backup to restore on other machine?
Periodic=C2=A0b= ackup is combine of snapshot and incremental backup. Remove incremental bac= kup after new snapshot.
Take backup to restore=C2=A0on other mach= ine: You can use snapshot after flushing memtable or Use sstableloader.


----
VafaTech.com - A Total Solution fo= r Data Gathering & Analysis

<= div dir=3D"ltr" class=3D"gmail_attr">On Thu, Nov 28, 2019 at 6:05 AM guo Ma= xwell <cclive1= 601@gmail.com> wrote:
for cassandra or datastax's documentation= , commitlog's backup is not mentioned.=C2=A0
only snapshot and incr= emental backup is described to do backup .

Though = commitlog's archive for keyspace/table is not support but commitlog'= ; replay (though you must put log to commitlog_dir and restart the process)=
support the feature of keyspace/table' replay filter (using = -Dcassandra.replayList with the keyspace1.table1,keyspace1.table2 format to= replay the specified keyspace/table)

Snapshot do = affect the storage, for us we got snapshot one week a time under the low bu= siness peak and making snapshot got throttle ,for you you may=C2=A0
see the issue (https://issues.apache.org/jira/browse/CASSANDRA-= 13019)



= Adarsh Kumar <= adarsh0007@gmail.com> =E4=BA=8E2019=E5=B9=B411=E6=9C=8828=E6=97=A5= =E5=91=A8=E5=9B=9B =E4=B8=8A=E5=8D=881:00=E5=86=99=E9=81=93=EF=BC=9A
Thank= s Guo and Eric for replying,

I have some confusions abou= t=C2=A0commit log backup:
  1. commit log archival technique i= s ( https://support.datastax.com/hc/en-us/articles/115001593706-Man= ual-Backup-and-Restore-with-Point-in-time-and-table-level-restore-=C2= =A0 ) as good as an incremental backup, as it also captures commit logs aft= er memtable flush.
  2. If we go for "Snapshot + Incremental bk + C= ommit log", here we have to take commit log from commit log directory = (is there any SOP for this?). As commit logs are not per table or ks, we wi= ll have chalange in restoring selective tables.
  3. Snapshot based back= ups are easy to manage and operate due to its simplicity. But they are heav= y on storage. Any views on this?
  4. Please share any successful strate= gy that someone is using for production. We are still in the design phase a= nd want to implement=C2=A0the best solution.
Thanks Eric for = sharing link for medusa.

Regards,
= Adarsh Kumar

On Wed, Nov 27, 2019 at 5:16 PM guo Maxwell <cclive1601@gmail.com= > wrote:
For me, I think the last one :=C2=A0
=C2=A0Snapshot=C2=A0+= Incremental=C2=A0+ commitlog=C2=A0
is the most=C2=A0meaningf= ul way to do backup and restore, when you make the data backup to some wher= e else like AWS S3.
  • Snapshot b= ased backup // for incremental data will not be backuped and may lose data = when restore to the time latter than snapshot time;
  • Incremental backups // better than snapshot backup .but with=C2= =A0Insufficient data accuracy. For data remain in the memtable will be lose= ;
  • Snapshot=C2=A0+ incremental=C2=A0
  • =
  • Snapshot=C2=A0+ commitlog archival // better= data precision than made incremental backup, but the data in the non archi= ved commitlog(not archive and commitlog log not closed) will not restore an= d will lose. Also when log is too much, do log reply will cost very mucu ti= me
For me ,We use snapshot=C2=A0+ incremental=C2=A0+ commitlo= g archive. We read snapshot data and incremental data .Also the log is back= uped .But we will not backup the=C2=A0
log whose=C2=A0data have b= een flush to sstable ,for the data will be backuped by the way we do increm= ental backup .

This way , the data will exist in t= he format of sstable trough snapshot backup and incremental backup . The lo= g number will be very small .And log replay will not cost much time.
<= div>


Eric LELEU <eric@strapdata.com> =E4=BA=8E2019=E5= =B9=B411=E6=9C=8827=E6=97=A5=E5=91=A8=E4=B8=89 =E4=B8=8B=E5=8D=884:13=E5=86= =99=E9=81=93=EF=BC=9A
=20 =20 =20

Hi,

TheLastPickle & Spotify have released Medusa as Cassandra Backup tool.

See : https://thelastpickle.com/blo= g/2019/11/05/cassandra-medusa-backup-tool-is-open-source.html

Hope this link will help you.

Eric


Le 27/11/2019 =C3=A0 08:10, Adarsh Kumar a =C3=A9crit=C2=A0:
=20
Hi,

I was looking for the backup strategies of Cassandra. After some study I came to know that there are the following options:
  • Snapshot based backup
  • Incremental backups
  • Snapshot=C2=A0+ incremental
  • Snapshot=C2=A0+ commitlog archival
  • Snapshot=C2=A0+ Incremental=C2=A0+ commitlog
Which is the most suitable=C2=A0and feasible=C2=A0approach? = Also which of these is used most.
Please let me know if there=C2=A0is any other option to tool available.

Thanks in advance.

Regards,
Adarsh Kumar


--
you are the apple of my eye !


--
you are the apple of my eye !


--
you are the apple of my eye !
--00000000000064e6840598a78a67--