hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: [DISCUSSION] Merge Backup / Restore - Branch HBASE-7912
Date Wed, 20 Jul 2016 14:08:15 GMT
bq. have a cleaner that wont delete those storefiles from archive.

The cleaner needs to work with backup merge so that the storefiles are
ultimately cleaned.

We can continue discussion on HBASE-14135.

On Tue, Jul 19, 2016 at 10:35 PM, rahul gidwani <rahul.gidwani@gmail.com>
wrote:

> Have we considered that if we wanted to do incremental backups for
> particular table(s),  we can just keep track of all the memstore flushes
> for those table(s) [and add some logic for bulk load as well] and have a
> cleaner that wont delete those storefiles from archive.  Of course we would
> have to flush memstores for the time boundaries (or do a WAL roll) but that
> is only for the incremental boundaries.
>
> That way the recovery process would be much faster and your incremental
> backup is truly only what you wrote for that day.....and if you wanted to
> delete an incremental backup (say only keep the last n backups around) we
> would just compact those together.
>
> Maybe this has already been discussed, if it has I'm sorry for bringing
> this up.
>
> On Tue, Jul 19, 2016 at 6:35 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > I have attached rebased mega patch to HBASE-14123 which is going through
> QA
> > run.
> >
> > I expect some findbugs / javadoc  warnings which need to be addressed
> > before the merge.
> >
> > On Tue, Jul 19, 2016 at 6:21 PM, Enis Söztutar <enis@apache.org> wrote:
> >
> > > Thanks Matteo for chiming in.
> > >
> > > On Tue, Jul 19, 2016 at 5:02 PM, Matteo Bertozzi <
> > theo.bertozzi@gmail.com>
> > > wrote:
> > >
> > > > I did some review in the early beginning, but then lost track of the
> > > > changes.
> > > > but I'd like to give a quick review to the full code once people here
> > are
> > > > ok with getting this feature in master (2.0).
> > > > (let say we put a deadline for reviews, like 1 week for reviewing the
> > > full
> > > > stuff after everyone agrees to get this in. just to avoid holding
> this
> > > for
> > > > too long, but still enough time to have people that are interested to
> > > look
> > > > at it. with did the same thing for MOB with a mega patch
> > > > https://reviews.apache.org/r/36391/)
> > > >
> > >
> > > This sounds good. Vladimir / Ted how do you guys want to handle the
> > merge?
> > > As a giant patch or a rebase of code in the branch and through git
> merge.
> > >
> > > We need to run a vote when the to-be-merged branched is ready. We can
> > set a
> > > vote timeout for at least 1 week.
> > >
> > >
> > > >
> > > > most of the code seemed isolated from the beginning, few changes here
> > and
> > > > there in the core.
> > > > so, this side of things seems ok to me.
> > > >
> > > > maybe some work to add IT tests as mentioned above, but that should
> not
> > > > take long.
> > > >
> > > > I don't know if there are already docs, but that is another thing we
> > may
> > > > want to get in with the merge.
> > > > a minimal coverage at least on how to use the feature, and maybe
> > calling
> > > it
> > > > out as experimental?
> > > >
> > > > my main concern were around incremental backups.
> > > > I'm still not convinced around the fact that because the WALs contain
> > > > regions of multiple tables
> > > > the incremental backup will keep around WALs with some data that we
> > don't
> > > > really want in the backup (for space or maybe security reason).
> > > >
> > > > then there was the question about for how long should I take
> > > incrementals,
> > > > before deciding that a fresh full backup is less costly in terms of
> > > space.
> > > > but I think this incremental merge/compaction was a feature on the
> > > roadmap
> > > > as Phase3.
> > > > which I think is ok to get later on,
> > > > maybe just call out a lifecycle example on the docs under "best
> > > practices".
> > > >
> > >
> > > I think this will depend on the use case, and other factors like
> > bandwidth
> > > available, how much data
> > > the user is willing to lose in case of catastrophic failure and how
> > > "expensive" is full backup versus
> > > incremental one.
> > >
> > > The full backup should also be useable by default, so maybe we can make
> > an
> > > option to not even keep WAL files, and completely disable incremental
> > > backups?
> > >
> > > Enis
> > >
> > >
> > > >
> > > > has anyone interested in using backups looked at the doc in
> HBASE-7912?
> > > > is the current design of incremental backup acceptable for everyone
> > > wanting
> > > > to use this feature?
> > > > (maybe this should be a question for the @user list and not dev)
> > > >
> > > > is there anyone already using this feature or it is just dev testing
> > it?
> > > > to me will be interesting having a use-case/workflow example,
> > > > to see if in the real world my concerns about incremental are not
> > showing
> > > > up.
> > > >
> > > > On Tue, Jul 19, 2016 at 1:35 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> > > >
> > > > > Gentle ping on this subject.
> > > > >
> > > > > The changes are mostly non-intrusive.
> > > > >
> > > > > More comments are welcome.
> > > > >
> > > > > On Mon, Jul 11, 2016 at 9:29 PM, Vladimir Rodionov <
> > > > vladrodionov@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Not that hard, Andrew. I will open JIRA.
> > > > > >
> > > > > > -Vlad
> > > > > >
> > > > > > On Mon, Jul 11, 2016 at 8:46 PM, Andrew Purtell <
> > > > > andrew.purtell@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > How hard would it be to convert what you've been using
to test
> > end
> > > to
> > > > > end
> > > > > > > during dev into an IT?
> > > > > > >
> > > > > > >
> > > > > > > On Jul 11, 2016, at 5:31 PM, Vladimir Rodionov <
> > > > vladrodionov@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > >>> Is there an integration test in hbase-it yet?
If not, any
> > tips
> > > > on a
> > > > > > > >>> semi-automateable way to take backups and
restore them?
> > > > > > > >
> > > > > > > > We do not have yet, but we have a lot of unit tests.
We
> > provide 2
> > > > API
> > > > > > for
> > > > > > > > backup:
> > > > > > > >
> > > > > > > > 1. Admin.getBackupAdmin
> > > > > > > >
> > > > > > > > 2. Command - line via hbase command.
> > > > > > > >
> > > > > > > > Everything is straightforward.
> > > > > > > >
> > > > > > > > -Vlad
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >> On Mon, Jul 11, 2016 at 5:23 PM, Dima Spivak <
> > > > dspivak@cloudera.com>
> > > > > > > wrote:
> > > > > > > >>
> > > > > > > >> Is there an integration test in hbase-it yet?
If not, any
> tips
> > > on
> > > > a
> > > > > > > >> semi-automateable way to take backups and restore
them?
> > > > > > > >>
> > > > > > > >> -Dima
> > > > > > > >>
> > > > > > > >> On Mon, Jul 11, 2016 at 6:42 PM, Vladimir Rodionov
<
> > > > > > > vladrodionov@gmail.com
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >>> Sorry, wrong links:
> > > > > > > >>> These are the phases:
> > > > > > > >>>
> > > > > > > >>> Phase 1:
> > > > > > > >>> https://issues.apache.org/jira/browse/HBASE-
> > > > > > > >>> <https://issues.apache.org/jira/browse/HBASE-14030>14030
> > > > > > > >>> Phase 2:
> > > > > > > >>> https://issues.apache.org/jira/browse/HBASE-
> > > > > > > >>> <https://issues.apache.org/jira/browse/HBASE-14123>14123
> > > > > > > >>> Phase 3:
> > > > > > > >>> https://issues.apache.org/jira/browse/HBASE-
> > > > > > > >>> <https://issues.apache.org/jira/browse/HBASE-14414>14414
> > > > > > > >>>
> > > > > > > >>> -Vlad
> > > > > > > >>>
> > > > > > > >>> On Mon, Jul 11, 2016 at 4:41 PM, Vladimir
Rodionov <
> > > > > > > >> vladrodionov@gmail.com
> > > > > > > >>> wrote:
> > > > > > > >>>
> > > > > > > >>>> These are the phases:
> > > > > > > >>>>
> > > > > > > >>>> Phase 1:
> > > > > > > >>>> https://issues.apache.org/jira/browse/HBASE-
> > > > > > > >>>> <https://issues.apache.org/jira/browse/HBASE-7912>14030
> > > > > > > >>>> Phase 2:
> > > > > > > >>>> https://issues.apache.org/jira/browse/HBASE-
> > > > > > > >>>> <https://issues.apache.org/jira/browse/HBASE-7912>14123
> > > > > > > >>>> Phase 3:
> > > > > > > >>>> https://issues.apache.org/jira/browse/HBASE-
> > > > > > > >>>> <https://issues.apache.org/jira/browse/HBASE-7912>14414
> > > > > > > >>>>
> > > > > > > >>>> -Vlad
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>> On Mon, Jul 11, 2016 at 12:21 PM, Enis
Söztutar <
> > > > enis@apache.org>
> > > > > > > >> wrote:
> > > > > > > >>>>
> > > > > > > >>>>> As you guys may already be familiar,
Vladimir, Ted, Jerry
> > and
> > > > > > others
> > > > > > > >>> have
> > > > > > > >>>>> been developing the backup / restore
functionality in a
> > > series
> > > > of
> > > > > > > >> issues
> > > > > > > >>>>> committed in the separate branch HBASE-7912[1].
> > > > > > > >>>>>
> > > > > > > >>>>> Backup / Restore functionality is
tracked as a 4-phase
> > > project,
> > > > > and
> > > > > > > >> the
> > > > > > > >>>>> first two phases are complete and
useable. We are now
> > working
> > > > on
> > > > > > > >> Phase 3
> > > > > > > >>>>> items, which are mostly improvements.
We think that the
> > > current
> > > > > > code
> > > > > > > >> in
> > > > > > > >>>>> the
> > > > > > > >>>>> branch containing all Phase 1 and
Phase 2 items, and some
> > > > Phase 3
> > > > > > > >> items
> > > > > > > >>> is
> > > > > > > >>>>> useable on it's own, and we do not
have to wait for all
> the
> > > > > > > subtickets
> > > > > > > >>> to
> > > > > > > >>>>> be finished to make it completely
useable (as follow up
> > > tickets
> > > > > are
> > > > > > > >>> mostly
> > > > > > > >>>>> improvements or optimizations). The
improvements in the
> > works
> > > > are
> > > > > > all
> > > > > > > >>>>> backwards compatible with the existing
stuff. Thus, we
> > would
> > > > like
> > > > > > to
> > > > > > > >>>>> propose that the branch HBASE-7912
be merged into master.
> > > The
> > > > > > parent
> > > > > > > >>> jira
> > > > > > > >>>>> has a design doc that goes into details
about the
> > > > implementation
> > > > > > and
> > > > > > > >>>>> design
> > > > > > > >>>>> choices in case you are interested[2].
> > > > > > > >>>>>
> > > > > > > >>>>> Most of the changes are largely non-intrusive
and
> confined
> > to
> > > > the
> > > > > > > >>>>> backup subsystem.
> > > > > > > >>>>> The unit tests have been passing on
manual runs and we
> > > > > > (hortonworks)
> > > > > > > >>> have
> > > > > > > >>>>> been running the integration tests
as well as some other
> > > > > > shell-based
> > > > > > > >>>>> system
> > > > > > > >>>>> tests on a forked version of the code.
Most of the work
> has
> > > > been
> > > > > > > >>> reviewed
> > > > > > > >>>>> by 1, 2 or 3 committers already (mostly
Ted, myself and
> > > Jerry).
> > > > > > > >>>>>
> > > > > > > >>>>> What do you guys think? Is it time
to call a vote? Any
> > > concerns
> > > > > or
> > > > > > > >>>>> feedback
> > > > > > > >>>>> appreciated.
> > > > > > > >>>>>
> > > > > > > >>>>> [1] https://issues.apache.org/jira/browse/HBASE-7912
> > > > > > > >>>>> [2]
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12816339/HBaseBackupAndRestore%20-0.91.pdf
> > > > > > > >>>>>
> > > > > > > >>>>> Enis
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message