hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Rodionov <vladrodio...@gmail.com>
Subject Re: [DISCUSSION] Merge Backup / Restore - Branch HBASE-7912
Date Mon, 11 Sep 2017 18:07:50 GMT
Stack, Andrew

We have doc blocker and (partially) HBASE-15227: two sub-tasks remain: one
is unit test (you can't call it blocker)
and another for FT support during incremental backup with bulk loading. The
latter one have been probably addressed
already in other HBASE-15527 subtasks. I have to reassess this.

That is mostly it. Yes, We have not done real testing with real data on a
real cluster yet, except QA  testing on a small OpenStack
cluster (10 nodes). That is our probably the biggest minus right now. I
would like to inform community that this week we are going to start
full scale testing with reasonably sized data sets.

The recent committed improvements, such as ability to run backup/restore on
a particular Yarn pool (queue) allows precise control
of a cluster utilization during operation (not to interfere much with a
regular cluster operations). Another one -
 converting WAL on the fly to HFiles - significantly improves storage usage
on a backup site.

My plan is to finish HBASE-17825 (further performance optimizations). This
will cut down number of MR jobs during incremental backup
from 2*N to 2  (N - number of tables). That will probably take 2-3 more days

Then:

1. Address remaining two sub-tasks in HBASE-15227
2. Update Release notes for all relevant B&R JIRAs
3. Work on doc

After that we can call it feature full complete. Taking into account the
vast amount of efforts
spent on this feature (including QA testing) I would say that we are
probably quite close to GA right now, but only
after real testing is done (I do not anticipate significant issues, except
probably correct failure handling).

On a feature itself. We provide tools to fully automate backup and restore
tasks: create backup (full and incremental), restore
from image, delete backups, merge backups, history, history per table,
backup set management.

Hopefully, my write up addresses at least some of your concerns.

-Vlad

On Sun, Sep 10, 2017 at 6:27 AM, Josh Elser <elserj@apache.org> wrote:

> On Sat, Sep 9, 2017 at 7:04 PM, stack <saint.ack@gmail.com> wrote:
> > In spite of repeated requests for eng summary of state of this feature --
> > summary of what is in 2.0, what is not, what the capabilities are, how
> well
> > it has been tested and at what scale -- all I get, when the requests are
> > not ignored, are pointers to lists of ill-describing jiras and some
> pending
> > user facing doc update.
>
> Yes, this is a problem. We, especially you as RM, shouldn't have
> outstanding questions as to the quality/state of B&R.
>
> > For other features, mob or region server groups, I know that they have
> been
> > running at scale in production for as much as a year and more. I have
> some
> > confidence these items basically work.  For backup/restore I have no such
> > sense even after spending time in review and trying to use the feature.
>
> I can attest to the feature being tested on small clusters. I'm not
> sure about larger than 10node tests. If this is less a worry and more
> a veto, let's get some criteria on the kind of testing you're looking
> for to avoid having to rehash later.
>
> Do we have any kind of integration tests in the codebase now that can
> help increase Stack's confidence?
>
> > As release manager, I have say over what makes it into a release.  Unless
> > the work is done to convince me that backup/restore is more than a lump
> of
> > code and a few unit tests that can pass on some fellows laptop, I am
> going
> > to kick it out of branch-2.  Let the feature harden more in master branch
> > before it ships in a release.
>
> While it was a few months ago now, I can also attest to this being
> more than some unit tests (I think I looked at it after I saw you last
> down in the weeds).
>
> I do worry about trying to remove it at this state.
>
> * Do you consider the B&R code in the repository implicitly harmful?
> Is there harm in shipping with docs capturing the concern.
> * Trying to revert all relevant pieces from branch-2 is non-trivial.
> * I would feel quite dejected if some feature I spent a year+ working
> on (*not* making assertions on my perception of quality) was removed
> from the release line it was expected to land.
>
> > S
> >
> > On Sep 8, 2017 10:59 PM, "Vladimir Rodionov" <vladrodionov@gmail.com>
> wrote:
> >
> >> >> Have I grasped the state of things correctly, Vlad?
> >>
> >> Josh, the only thing which is still pending is doc update. All other
> >> features are good to have but not a blockers for 2.0 release.
> >>
> >> -Vlad
> >>
> >> On Fri, Sep 8, 2017 at 10:42 PM, Vladimir Rodionov <
> vladrodionov@gmail.com
> >> >
> >> wrote:
> >>
> >> > >> What testing and at what
> >> > >> scale has testing been done?
> >> >
> >> > Do we have have that for other features?
> >> >
> >> >
> >> > On Fri, Sep 8, 2017 at 10:41 PM, Vladimir Rodionov <
> >> vladrodionov@gmail.com
> >> > > wrote:
> >> >
> >> >> >> It asks: "How do I figure what of backup/restore feature is
going
> to
> >> >> be in
> >> >> >>hbase-2.0.0?
> >> >>
> >> >> Hmm, wait for doc update.
> >> >>
> >> >>
> >> >> On Fri, Sep 8, 2017 at 2:39 PM, Stack <stack@duboce.net> wrote:
> >> >>
> >> >>> HBASE-14414 is a JIRA with a list of random seeming issues w/
> >> >>> non-descript
> >> >>> summaries: "Add nonce support to TableBackupProcedure, BackupID
must
> >> >>> include backup set name, ...". The last comment in that issue is
> from
> >> >>> July.
> >> >>> It asks: "How do I figure what of backup/restore feature is going
> to be
> >> >>> in
> >> >>> hbase-2.0.0? Thanks Vladimir Rodionov
> >> >>> <https://issues.apache.org/jira/secure/ViewProfile.jspa?
> name=vrodionov
> >> >>> >."
> >> >>> to which there is no answer.  Doc update is TODO.
> >> >>>
> >> >>> Where is the summary of the capability in hbase-2? What testing
and
> at
> >> >>> what
> >> >>> scale has testing been done? Is this 'stable or experimental'?
If I
> >> can't
> >> >>> get basic info on this feature though I ask repeatedly, what hope
> does
> >> >>> the
> >> >>> poor old operator have?
> >> >>>
> >> >>> St.Ack
> >> >>>
> >> >>>
> >> >>> On Fri, Sep 8, 2017 at 1:59 PM, Vladimir Rodionov <
> >> >>> vladrodionov@gmail.com>
> >> >>> wrote:
> >> >>>
> >> >>> > HBASE-14414
> >> >>> >
> >> >>> > On Fri, Sep 8, 2017 at 1:14 PM, Stack <stack@duboce.net>
wrote:
> >> >>> >
> >> >>> > > Where do I go to get the current status of this feature?
> Looking in
> >> >>> JIRA
> >> >>> > I
> >> >>> > > see loads of issues open against backup including some
against
> >> >>> > hbase-2.0.0
> >> >>> > > and no progress being made that I can discern.
> >> >>> > >
> >> >>> > > Thanks,
> >> >>> > > S
> >> >>> > >
> >> >>> > >
> >> >>> > >
> >> >>> > > On Wed, Nov 23, 2016 at 8:52 AM, Stack <stack@duboce.net>
> wrote:
> >> >>> > >
> >> >>> > > > On Tue, Nov 22, 2016 at 6:48 PM, Stack <stack@duboce.net>
> wrote:
> >> >>> > > >
> >> >>> > > >> On Tue, Nov 22, 2016 at 3:17 PM, Vladimir Rodionov
<
> >> >>> > > >> vladrodionov@gmail.com> wrote:
> >> >>> > > >>
> >> >>> > > >>> >> and/or he answered most of the
review feedback
> >> >>> > > >>>
> >> >>> > > >>> No, questions are still open, but I do not
see any blockers
> and
> >> >>> we
> >> >>> > have
> >> >>> > > >>> HBASE-16940 to address these questions.
> >> >>> > > >>>
> >> >>> > > >>>
> >> >>> > > >> Agree. No blockers but stuff that should be
dealt with (No
> one
> >> >>> will
> >> >>> > pay
> >> >>> > > >> me any attention once merge goes in -- smile).
> >> >>> > > >>
> >> >>> > > >>
> >> >>> > > > Let me clarify the above. I want review addressed
before merge
> >> >>> happens.
> >> >>> > > > Sorry if any confusion.
> >> >>> > > > St.Ack
> >> >>> > > >
> >> >>> > > >
> >> >>> > > >
> >> >>> > > >
> >> >>> > > >
> >> >>> > > >
> >> >>> > > >> St.Ack
> >> >>> > > >>
> >> >>> > > >>
> >> >>> > > >>
> >> >>> > > >>> On Tue, Nov 22, 2016 at 3:04 PM, Devaraj
Das <
> >> >>> ddas@hortonworks.com>
> >> >>> > > >>> wrote:
> >> >>> > > >>>
> >> >>> > > >>> > Hi Stack, hats off to you for spending
so much time on
> this!
> >> >>> > Thanks!
> >> >>> > > >>> From
> >> >>> > > >>> > my understanding, Vlad has raised follow-up
jiras for the
> >> >>> issues
> >> >>> > you
> >> >>> > > >>> > raised, and/or he answered most of
the review feedback.
> So,
> >> do
> >> >>> you
> >> >>> > > >>> think we
> >> >>> > > >>> > could do a merge vote now?
> >> >>> > > >>> > Devaraj.
> >> >>> > > >>> > ________________________________________
> >> >>> > > >>> > From: Vladimir Rodionov <vladrodionov@gmail.com>
> >> >>> > > >>> > Sent: Monday, November 21, 2016 8:34
PM
> >> >>> > > >>> > To: dev@hbase.apache.org
> >> >>> > > >>> > Subject: Re: [DISCUSSION] Merge Backup
/ Restore - Branch
> >> >>> > HBASE-7912
> >> >>> > > >>> >
> >> >>> > > >>> > >> I have spent a good bit of
time reviewing and testing
> this
> >> >>> > > feature.
> >> >>> > > >>> I
> >> >>> > > >>> > would
> >> >>> > > >>> > >> like my review and concerns
addressed and I'd like it
> to
> >> be
> >> >>> > clear
> >> >>> > > >>> how;
> >> >>> > > >>> > >> either explicit follow-on
issues, pointers to where in
> the
> >> >>> patch
> >> >>> > > or
> >> >>> > > >>> doc
> >> >>> > > >>> > my
> >> >>> > > >>> > >> remarks have been catered
to, etc. Until then, I am
> >> against
> >> >>> > > commit.
> >> >>> > > >>> >
> >> >>> > > >>> > Stack, mega patch review comments will
be addressed in the
> >> >>> > dedicated
> >> >>> > > >>> JIRA:
> >> >>> > > >>> > HBASE-16940
> >> >>> > > >>> > I have open several other JIRAs to
address your other
> >> comments
> >> >>> (not
> >> >>> > > on
> >> >>> > > >>> > review board).
> >> >>> > > >>> >
> >> >>> > > >>> > Details are here (end of the thread):
> >> >>> > > >>> > https://issues.apache.org/jira/browse/HBASE-14123
> >> >>> > > >>> >
> >> >>> > > >>> > Let me know what else should we do
to move merge forward.
> >> >>> > > >>> >
> >> >>> > > >>> > -Vlad
> >> >>> > > >>> >
> >> >>> > > >>> >
> >> >>> > > >>> > On Fri, Nov 18, 2016 at 4:54 PM, Stack
<stack@duboce.net>
> >> >>> wrote:
> >> >>> > > >>> >
> >> >>> > > >>> > > On Fri, Nov 18, 2016 at 3:53 PM,
Ted Yu <
> >> yuzhihong@gmail.com
> >> >>> >
> >> >>> > > wrote:
> >> >>> > > >>> > >
> >> >>> > > >>> > > > Thanks, Matteo.
> >> >>> > > >>> > > >
> >> >>> > > >>> > > > bq. restore is not clear
if given an incremental id it
> >> >>> will do
> >> >>> > > the
> >> >>> > > >>> full
> >> >>> > > >>> > > > restore from full up to that
point or if i need to
> apply
> >> >>> > manually
> >> >>> > > >>> > > > everything
> >> >>> > > >>> > > >
> >> >>> > > >>> > > > The restore takes into consideration
of the dependent
> >> >>> > backup(s).
> >> >>> > > >>> > > > So there is no need to apply
preceding backup(s)
> >> manually.
> >> >>> > > >>> > > >
> >> >>> > > >>> > > >
> >> >>> > > >>> > > I ask this question on the issue.
It is not clear from
> the
> >> >>> usage
> >> >>> > or
> >> >>> > > >>> doc
> >> >>> > > >>> > how
> >> >>> > > >>> > > to run a restore from incremental.
Can you fix in doc
> and
> >> >>> usage
> >> >>> > how
> >> >>> > > >>> so I
> >> >>> > > >>> > > can be clear and try it. Currently
I am stuck verifying
> a
> >> >>> round
> >> >>> > > trip
> >> >>> > > >>> > backup
> >> >>> > > >>> > > restore made of incrementals.
> >> >>> > > >>> > >
> >> >>> > > >>> > > Thanks,
> >> >>> > > >>> > > S
> >> >>> > > >>> > >
> >> >>> > > >>> > >
> >> >>> > > >>> > >
> >> >>> > > >>> > > > On Fri, Nov 18, 2016 at 3:48
PM, Matteo Bertozzi <
> >> >>> > > >>> > > theo.bertozzi@gmail.com>
> >> >>> > > >>> > > > wrote:
> >> >>> > > >>> > > >
> >> >>> > > >>> > > > > I did one last pass
to the mega patch. I don't see
> >> >>> anything
> >> >>> > > major
> >> >>> > > >>> > that
> >> >>> > > >>> > > > > should block the merge.
> >> >>> > > >>> > > > >
> >> >>> > > >>> > > > > - most of the code is
isolated in the backup package
> >> >>> > > >>> > > > > - all the backup code
is client side
> >> >>> > > >>> > > > > - there are few changes
to the server side, mainly
> for
> >> >>> > > cleaners,
> >> >>> > > >>> wal
> >> >>> > > >>> > > > > rolling and similar
(which is ok)
> >> >>> > > >>> > > > > - there is a good number
of tests, and an
> integration
> >> >>> test
> >> >>> > > >>> > > > >
> >> >>> > > >>> > > > > the code seems to have
still some left overs from
> the
> >> old
> >> >>> > > >>> > > implementation,
> >> >>> > > >>> > > > > and some stuff needs
a cleanup. but I don't think
> this
> >> >>> should
> >> >>> > > be
> >> >>> > > >>> used
> >> >>> > > >>> > > as
> >> >>> > > >>> > > > an
> >> >>> > > >>> > > > > argument to block the
merge. I think the guys will
> keep
> >> >>> > working
> >> >>> > > >>> on
> >> >>> > > >>> > this
> >> >>> > > >>> > > > and
> >> >>> > > >>> > > > > they may also get help
of others once the patch is
> in
> >> >>> master.
> >> >>> > > >>> > > > >
> >> >>> > > >>> > > > > I still have my concerns
about the current
> limitations,
> >> >>> but
> >> >>> > > >>> these are
> >> >>> > > >>> > > > > things already planned
for phase 3, so some of this
> >> >>> stuff may
> >> >>> > > >>> even be
> >> >>> > > >>> > > in
> >> >>> > > >>> > > > > the final 2.0.
> >> >>> > > >>> > > > > but as long as we have
a "current limitations"
> section
> >> >>> in the
> >> >>> > > >>> user
> >> >>> > > >>> > > guide
> >> >>> > > >>> > > > > mentioning important
stuff like the ones below, I'm
> ok
> >> >>> with
> >> >>> > it.
> >> >>> > > >>> > > > >  - if you write to the
table with
> Durability.SKIP_WALS
> >> >>> your
> >> >>> > > data
> >> >>> > > >>> will
> >> >>> > > >>> > > not
> >> >>> > > >>> > > > > be in the incremental-backup
> >> >>> > > >>> > > > >  - if you bulkload files
that data will not be in
> the
> >> >>> > > incremental
> >> >>> > > >>> > > backup
> >> >>> > > >>> > > > > (HBASE-14417)
> >> >>> > > >>> > > > >  - the incremental backup
will not only contains the
> >> >>> data of
> >> >>> > > the
> >> >>> > > >>> > table
> >> >>> > > >>> > > > you
> >> >>> > > >>> > > > > specified but also the
regions from other tables
> that
> >> >>> are on
> >> >>> > > the
> >> >>> > > >>> same
> >> >>> > > >>> > > set
> >> >>> > > >>> > > > > of RSs (HBASE-14141)
...maybe a note about security
> >> >>> around
> >> >>> > this
> >> >>> > > >>> topic
> >> >>> > > >>> > > > >  - the incremental backup
will not contains just the
> >> >>> "latest
> >> >>> > > row"
> >> >>> > > >>> > > between
> >> >>> > > >>> > > > > backup A and B, but
it will also contains all the
> >> updates
> >> >>> > > >>> occurred in
> >> >>> > > >>> > > > > between. but the restore
does not allow you to
> restore
> >> >>> up to
> >> >>> > a
> >> >>> > > >>> > certain
> >> >>> > > >>> > > > > point in time, the restore
will always be up to the
> >> >>> "latest
> >> >>> > > >>> backup
> >> >>> > > >>> > > > point".
> >> >>> > > >>> > > > >  - you should limit
the number of "incremental" up
> to N
> >> >>> (or
> >> >>> > > maybe
> >> >>> > > >>> > > SIZE),
> >> >>> > > >>> > > > to
> >> >>> > > >>> > > > > avoid replay time becoming
the bottleneck.
> >> (HBASE-14135)
> >> >>> > > >>> > > > >
> >> >>> > > >>> > > > > I'll be ok even with
the above not being in the
> final
> >> >>> 2.0,
> >> >>> > > >>> > > > > but i'd like to see
as blocker for the final 2.0
> (not
> >> the
> >> >>> > > merge)
> >> >>> > > >>> > > > >  - the backup code moved
in an hbase-backup module
> >> >>> > > >>> > > > >  - and some more work
around tools, especially to
> try
> >> to
> >> >>> > unify
> >> >>> > > >>> and
> >> >>> > > >>> > make
> >> >>> > > >>> > > > > simple the backup experience
(simple example: in
> some
> >> >>> case
> >> >>> > > there
> >> >>> > > >>> is a
> >> >>> > > >>> > > > > backup_id argument in
others a backupId argument. or
> >> >>> things
> >> >>> > > >>> like..
> >> >>> > > >>> > > > restore
> >> >>> > > >>> > > > > is not clear if given
an incremental id it will do
> the
> >> >>> full
> >> >>> > > >>> restore
> >> >>> > > >>> > > from
> >> >>> > > >>> > > > > full up to that point
or if i need to apply manually
> >> >>> > > everything).
> >> >>> > > >>> > > > >
> >> >>> > > >>> > > > > in conclusion, I think
we can open a merge vote.
> I'll
> >> be
> >> >>> +1
> >> >>> > on
> >> >>> > > >>> it,
> >> >>> > > >>> > and
> >> >>> > > >>> > > I
> >> >>> > > >>> > > > > think we should try
to reject -1 with just a "code
> >> >>> cleanup"
> >> >>> > > >>> > motivation,
> >> >>> > > >>> > > > > since there will still
be work going on on the code
> >> >>> after the
> >> >>> > > >>> merge.
> >> >>> > > >>> > > > >
> >> >>> > > >>> > > > > Matteo
> >> >>> > > >>> > > > >
> >> >>> > > >>> > > > >
> >> >>> > > >>> > > > > On Sun, Nov 6, 2016
at 10:54 PM, Devaraj Das <
> >> >>> > > >>> ddas@hortonworks.com>
> >> >>> > > >>> > > > wrote:
> >> >>> > > >>> > > > >
> >> >>> > > >>> > > > > > Stack and others,
anything else on the patch?
> Merge
> >> to
> >> >>> > master
> >> >>> > > >>> now?
> >> >>> > > >>> > > > > >
> >> >>> > > >>> > > > >
> >> >>> > > >>> > > >
> >> >>> > > >>> > >
> >> >>> > > >>> >
> >> >>> > > >>>
> >> >>> > > >>
> >> >>> > > >>
> >> >>> > > >
> >> >>> > >
> >> >>> >
> >> >>>
> >> >>
> >> >>
> >> >
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message