hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Rodionov <vladrodio...@gmail.com>
Subject Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
Date Mon, 26 Sep 2016 18:48:23 GMT
Ok, we had internal discussion and this is what we are suggesting now:

1. We will create separate module (hbase-backup) and move server-side code
there.
2. Master and RS will be MR and backup free.
3. The code from Master will be moved into standalone service
(BackupService) for procedure orchestration,
     operation resume/abort and SECURITY. It means - one additional
(process) similar to REST/Thrift server will be required
    to operate backup.

I would like to note that separate process running under hbase super user
is required to implement security properly in a multi-tenant environment,
otherwise, only hbase super user will be allowed to operate backups

Please let us know, what do you think, HBase people :?

-Vlad



On Sat, Sep 24, 2016 at 2:49 PM, Stack <stack@duboce.net> wrote:

> On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell <andrew.purtell@gmail.com>
> wrote:
>
> > At branch merge voting time now more eyes are getting on the design
> issues
> > with dissenting opinion emerging. This is the branch merge process
> working
> > as our community has designed it. Because this is the first full project
> > review of the code and implementation I think we all have to be
> flexible. I
> > see the community as trying to narrow the technical objection at issue to
> > the smallest possible scope. It's simple: don't call out to an external
> > execution framework we don't own from core master (and by extension
> > regionserver) code. We had this objection before to a proposed external
> > compaction implementation for
> > MOB so should not come as a surprise. Please let me know if I have
> > misstated this.
> >
> >
> The above is my understanding also.
>
>
> > This would seem to require a modest refactor of coordination to move
> > invocation of MR code out from any core code path. To restate what I
> think
> > is an emerging recommendation: Move cross HBase and MR coordination to a
> > separate tool. This tool can ask the master to invoke procedures on the
> > HBase side that do first mile export and last mile restore. (Internally
> the
> > tool can also use the procedure framework for state durability, perhaps,
> > just a thought.) Then the tool can further drive the things done with MR
> > like shipping data off cluster or moving remote data in place and
> preparing
> > it for import. These activities do not need procedure coordination and
> > involvement of the HBase master. Only the first and last mile of the
> > process needs atomicity within the HBase deploy. Please let me know if I
> > have misstated this.
> >
> >
> > Above is my understanding of our recommendation.
>
> St.Ack
>
>
>
> > > On Sep 24, 2016, at 8:17 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> > >
> > > bq. procedure gives you a retry mechanism on failure
> > >
> > > We do need this mechanism. Take a look at the multi-step
> > > in FullTableBackupProcedure, etc.
> > >
> > > bq. let the user export it later when he wants
> > >
> > > This would make supporting security more complex (user A shouldn't be
> > > exporting user B's backup). And it is not user friendly - at the time
> > > backup request is issued, the following is specified:
> > >
> > > +          + " BACKUP_ROOT     The full root path to store the backup
> > > image,\n"
> > > +          + "                 the prefix can be hdfs, webhdfs or
> gpfs\n"
> > >
> > > Backup root is an integral part of backup manifest.
> > >
> > > Cheers
> > >
> > >
> > > On Sat, Sep 24, 2016 at 7:59 AM, Matteo Bertozzi <
> > theo.bertozzi@gmail.com>
> > > wrote:
> > >
> > >>> On Sat, Sep 24, 2016 at 7:19 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> > >>>
> > >>> Ideally the export should have one job running which does the retry
> (on
> > >>> failed partition) itself.
> > >>>
> > >>
> > >> procedure gives you a retry mechanism on failure. if you don't use
> that,
> > >> than you don't need procedure.
> > >> if you want you can start a procedure executor in a non master process
> > (the
> > >> hbase-procedure is a separate package and does not depend on master).
> > but
> > >> again, export seems a case where you don't need procedure.
> > >>
> > >> like snapshot, the logic may just be: ask the master to take a backup.
> > and
> > >> let the user export it later when he wants. so you avoid having a MR
> job
> > >> started by the master since people does not seems to like it.
> > >>
> > >> for restore (I think that is where you use the MR splitter) you can
> > >> probably just have a backup ready (already splitted). there is
> already a
> > >> jira that should do that HBASE-14135. instead of doing the operation
> of
> > >> split/merge on restore. you consolidate the backup "offline" (mr job
> > >> started by the user) and then ask to restore the backup.
> > >>
> > >>
> > >>>
> > >>> On Sat, Sep 24, 2016 at 7:04 AM, Matteo Bertozzi <
> > >> theo.bertozzi@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> as far as I understand the code, you don't need procedure for the
> > >> export
> > >>>> itself.
> > >>>> the export operation is already idempotent, since you are just
> copying
> > >>>> files.
> > >>>> if the file exist and is complete (check length, checksum, ...) you
> > can
> > >>>> skip it,
> > >>>> otherwise you'll send it over again.
> > >>>>
> > >>>> you need the proc for taking the backup and restoring,
> > >>>> because you want to complete the operation and end up with a
> > consistent
> > >>>> state
> > >>>> across the multiple components you are updating (meta, fs, ...)
> > >>>> but again, for export you can just run the tool over and over until
> > the
> > >>>> operation succeed, and that should be ok.
> > >>>>
> > >>>>
> > >>>>
> > >>>> Matteo
> > >>>>
> > >>>>
> > >>>>> On Sat, Sep 24, 2016 at 6:54 AM, Ted Yu <yuzhihong@gmail.com>
> wrote:
> > >>>>>
> > >>>>> Master is involved in this discussion because currently only Master
> > >>>>> instantiates ProcedureExecutor which runs the 3 Procedures for
> > >> backup /
> > >>>>> restore.
> > >>>>>
> > >>>>> What if an optional standalone service which hosts
> ProcedureExecutor
> > >> is
> > >>>>> used for this purpose ?
> > >>>>> Would that have better chance of giving us middle ground so that we
> > >> can
> > >>>>> move this forward ?
> > >>>>>
> > >>>>> Cheers
> > >>>>>
> > >>>>>> On Fri, Sep 23, 2016 at 5:15 PM, Stack <stack@duboce.net> wrote:
> > >>>>>>
> > >>>>>> (Moved out of the Master doing MR DISCUSSION)
> > >>>>>>
> > >>>>>> On Fri, Sep 23, 2016 at 12:24 PM, Vladimir Rodionov <
> > >>>>>> vladrodionov@gmail.com>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>>>> -1 on that backup be in core hbase
> > >>>>>>>
> > >>>>>>> Not sure I understand what it means.
> > >>>>>>>
> > >>>>>>> Sorry for the imprecision.
> > >>>>>>
> > >>>>>> The -1 is NOT against backup/restore. I am -1 on MR as a
> dependency
> > >>> and
> > >>>>> so
> > >>>>>> -1 on the Master running backup/restore MR jobs, even if optional.
> > >>>>>>
> > >>>>>> Master should not depend on MR. We've gone out of our way to avoid
> > >>>> taking
> > >>>>>> MR on as dependency in the past. Seems late in the game for us to
> > >>>> change
> > >>>>>> our opinion on this. If we didn't do it for distributed log
> > >>> splitting,
> > >>>> or
> > >>>>>> MOB, why would we do it to support an optional backup/restore?
> > >>>>>>
> > >>>>>> I have opinions on the questions below -- i.e. that Master running
> > >>>>>> backup/restore is outside of the Master's charge -- but they are
> > >> not
> > >>>>> worth
> > >>>>>> much since I've not done much by way of review or contrib to
> > >>>>> backup/restore
> > >>>>>> other than to try it as a 'user' so I'll keep them to myself until
> > >> I
> > >>>> do.
> > >>>>> I
> > >>>>>> only came out from under my shell to participate on the MR as
> > >>>> dependency
> > >>>>>> chat.
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> M
> > >>>>>>
> > >>>>>>
> > >>>>>> 1. We are not allowed to use Master to orchestrate the whole
> > >> process?
> > >>>>>>
> > >>>>>>
> > >>>>>> We
> > >>>>>>> have already brought up all advantages of using
> > >>>>>>>   Master and distributed procedures for backup and restore.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Downside of moving this to client tool is lack of fault
> > >> tolerance:
> > >>>>>>> 1.1 Client won't be allowed to do any operations, that can,
> > >>>>> potentially
> > >>>>>>> affect
> > >>>>>>> cluster, such as disabling splits/merges, balancer.
> > >>>>>>> 1.2 In case of client failure who will be doing the whole
> > >> rollback
> > >>>>>> stuff?
> > >>>>>>> We are trying to make it atomic.
> > >>>>>>>
> > >>>>>>> Security is not clear.
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> 2. We are not allowed to modify code of existing HBase core
> classes
> > >>>> (what
> > >>>>>>> does core mean anyway)?
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>> 3. We are not allowed to create backup system table
> > >> (hbase:backup)
> > >>>> in a
> > >>>>>>> system space? Only in user space? The table is global.
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>> 2. is critical. Despite the fact, that 95% of code is new, we
> > >> have
> > >>>>>> touched,
> > >>>>>>> of course some existing HBase code.
> > >>>>>>> 3. is not that critical, of course we can move backup system into
> > >>>> user
> > >>>>>>> space.
> > >>>>>>>
> > >>>>>>> And finally, will moving backup into external tool give us +1
> > >> from
> > >>>>> stack?
> > >>>>>>>
> > >>>>>>> -Vlad
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Fri, Sep 23, 2016 at 11:26 AM, Stack <stack@duboce.net>
> > >> wrote:
> > >>>>>>>
> > >>>>>>>> On Fri, Sep 23, 2016 at 11:22 AM, Vladimir Rodionov <
> > >>>>>>>> vladrodionov@gmail.com>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>>>> + MR is dead
> > >>>>>>>>>
> > >>>>>>>>> Does MR know that? :)
> > >>>>>>>>>
> > >>>>>>>>> Again. With all due respect, stack - still no suggestions
> > >> what
> > >>>>> should
> > >>>>>>> we
> > >>>>>>>>> use for "bulk data move and transformation" instead of MR?
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Use whatever distributed engine suits your fancy -- MR, Spark,
> > >>>>>>> distributed
> > >>>>>>>> shell -- just don't have HBase core depend on it, even
> > >>> optionally.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>> I suggest voting first on "do we need backup in HBase"? In my
> > >>>>>> opinion,
> > >>>>>>>> some
> > >>>>>>>>> group members still not sure about that and some will give -1
> > >>>>>>>>> in any case. Just because ...
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>> We could run a vote, sure. -1 on that backup be in core hbase
> > >> (+1
> > >>>> on
> > >>>>>>> adding
> > >>>>>>>> all the API any such external tool might need to run).
> > >>>>>>>>
> > >>>>>>>> St.Ack
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>> -Vlad
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Fri, Sep 23, 2016 at 10:57 AM, Stack <stack@duboce.net>
> > >>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> On Fri, Sep 23, 2016 at 6:46 AM, Matteo Bertozzi <
> > >>>>>>>>> theo.bertozzi@gmail.com>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> let me try to go back to my original topic.
> > >>>>>>>>>>> this question was meant to be generic, and provide some
> > >>> rule
> > >>>>> for
> > >>>>>>>> future
> > >>>>>>>>>>> code.
> > >>>>>>>>>>>
> > >>>>>>>>>>> from what I can gather, a rule that may satisfy everyone
> > >>> can
> > >>>>> be:
> > >>>>>>>>>>> - we don't want any core feature (e.g.
> > >>>>> compaction/log-split/log-
> > >>>>>>>>> reply)
> > >>>>>>>>>>> over MR, because some cluster may not want or may have an
> > >>>>>>>>>>> external/uncontrolled MR setup.
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> +1
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>> - we allow non-core features (e.g. features enabled by a
> > >>>> flag)
> > >>>>>> to
> > >>>>>>>> run
> > >>>>>>>>> MR
> > >>>>>>>>>>> jobs from hbase, because unless you use the feature, MR
> > >> is
> > >>>> not
> > >>>>>>>>> required.
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>> -1 to hbase core depending on MR or core -- whether behind
> > >> a
> > >>>> flag
> > >>>>>> or
> > >>>>>>>> not
> > >>>>>>>>> --
> > >>>>>>>>>> ever being able to launch MR jobs.
> > >>>>>>>>>>
> > >>>>>>>>>> + MR is dead. We should be busy working hard to undo it
> > >> from
> > >>>>>>>> hbase-server
> > >>>>>>>>>> moving it out to be an optional module (Spark would be its
> > >>>> peer).
> > >>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and Appy
> > >>> are
> > >>>>>> busy
> > >>>>>>>>>> working hard on moving it up on to a new foundation. Lets
> > >> not
> > >>>>>> clutter
> > >>>>>>>>> task
> > >>>>>>>>>> harder by piling on more moving parts.
> > >>>>>>>>>>
> > >>>>>>>>>> St.Ack
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>> Matteo
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu <
> > >>> yuzhihong@gmail.com
> > >>>>>
> > >>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> I suggest you look at Matteo's work for
> > >> AssignmentManager
> > >>>>> which
> > >>>>>>> is
> > >>>>>>>> to
> > >>>>>>>>>>> make
> > >>>>>>>>>>>> Master more stable.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Cheers
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 <
> > >>> palomino219@gmail.com
> > >>>>>
> > >>>>>>> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> No, not your fault, at lease, not this time:)
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the
> > >>>>> sequence
> > >>>>>>> of
> > >>>>>>>>>> calls
> > >>>>>>>>>>>> when
> > >>>>>>>>>>>>> starting up the HMaster? HMaster is also a
> > >> regionserver
> > >>>> so
> > >>>>> it
> > >>>>>>>>> extends
> > >>>>>>>>>>>>> HRegionServer, and the initialization of
> > >> HRegionServer
> > >>>>>>> sometimes
> > >>>>>>>>>> needs
> > >>>>>>>>>>> to
> > >>>>>>>>>>>>> make rpc calls to HMaster. A simple change would
> > >> cause
> > >>>>>>>>> probabilistic
> > >>>>>>>>>>> dead
> > >>>>>>>>>>>>> lock or some strange NPEs...
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> That's why I'm very nervous when somebody wants to
> > >> add
> > >>>> new
> > >>>>>>>> features
> > >>>>>>>>>> or
> > >>>>>>>>>>>> add
> > >>>>>>>>>>>>> external dependencies to HMaster, especially add more
> > >>>> works
> > >>>>>> for
> > >>>>>>>> the
> > >>>>>>>>>>> start
> > >>>>>>>>>>>>> up processing...
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Thanks.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> 2016-09-23 20:02 GMT+08:00 Ted Yu <
> > >> yuzhihong@gmail.com
> > >>>> :
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I read through HADOOP-13433
> > >>>>>>>>>>>>>> <https://issues.apache.org/
> > >> jira/browse/HADOOP-13433>
> > >>> -
> > >>>>> the
> > >>>>>>>> cited
> > >>>>>>>>>>> race
> > >>>>>>>>>>>>>> condition is in jdk.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Suggest pinging the reviewer on JIRA to get it
> > >>> moving.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> bq. But the ugly code in HMaster is readlly a
> > >>>> problem...
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Can you be specific as to which code is ugly ? Is
> > >> it
> > >>> in
> > >>>>> the
> > >>>>>>>>> backup
> > >>>>>>>>>> /
> > >>>>>>>>>>>>>> restore mega patch ?
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Cheers
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 10:44 PM, 张铎 <
> > >>>>>> palomino219@gmail.com>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> If you guys have already implemented the feature
> > >> in
> > >>>> the
> > >>>>>> MR
> > >>>>>>>> way
> > >>>>>>>>>> and
> > >>>>>>>>>>>> the
> > >>>>>>>>>>>>>>> patch is ready for landing on master, I'm a -0 on
> > >>> it
> > >>>>> as I
> > >>>>>>> do
> > >>>>>>>>> not
> > >>>>>>>>>>> want
> > >>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>> block the development progress.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> But I strongly suggest later we need to revisit
> > >> the
> > >>>>>> design
> > >>>>>>>> and
> > >>>>>>>>>> see
> > >>>>>>>>>>> if
> > >>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>> can seperated the logic from HMaster as much as
> > >>>>> possible.
> > >>>>>>> HA
> > >>>>>>>> is
> > >>>>>>>>>>> not a
> > >>>>>>>>>>>>> big
> > >>>>>>>>>>>>>>> problem if you do not store any metada locally.
> > >> But
> > >>>> the
> > >>>>>>> ugly
> > >>>>>>>>> code
> > >>>>>>>>>>> in
> > >>>>>>>>>>>>>>> HMaster is readlly a problem...
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> And for security, I have a issue pending for a
> > >> long
> > >>>>> time.
> > >>>>>>> Can
> > >>>>>>>>>>> someone
> > >>>>>>>>>>>>>> help
> > >>>>>>>>>>>>>>> taking a simple look at it? This is what I mean,
> > >>> ugly
> > >>>>>>> code...
> > >>>>>>>>>>> logout
> > >>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>> destroy the credentials in a subject when it is
> > >>> still
> > >>>>>> being
> > >>>>>>>>> used,
> > >>>>>>>>>>> and
> > >>>>>>>>>>>>>>> declared as LimitPrivacy so I can not change the
> > >>>>> behivor
> > >>>>>>> and
> > >>>>>>>>> the
> > >>>>>>>>>>> only
> > >>>>>>>>>>>>> way
> > >>>>>>>>>>>>>>> to fix it is to write another piece of ugly
> > >> code...
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> https://issues.apache.org/
> > >> jira/browse/HADOOP-13433
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov <
> > >>>>>>>>>>> vladrodionov@gmail.com
> > >>>>>>>>>>>>> :
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> If in the future, we find better ways of
> > >> doing
> > >>>>> this
> > >>>>>>>>> without
> > >>>>>>>>>>>> using
> > >>>>>>>>>>>>>> MR,
> > >>>>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>>> can certainly consider that
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Our framework for distributed operations is
> > >>>> abstract
> > >>>>>> and
> > >>>>>>>>> allows
> > >>>>>>>>>>>>>>>> different implementations. MR is just one
> > >>>>>> implementation
> > >>>>>>> we
> > >>>>>>>>>>>> provide.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> -Vlad
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das <
> > >>>>>>>>>>> ddas@hortonworks.com
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Guys, first off apologies for bringing in the
> > >>>> topic
> > >>>>>> of
> > >>>>>>>>>> MR-based
> > >>>>>>>>>>>>>>>>> compactions.. But I was thinking more about
> > >> the
> > >>>>>>>>> SpliceMachine
> > >>>>>>>>>>>>>> approach
> > >>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>> managing compactions in Spark where
> > >> apparently
> > >>>> they
> > >>>>>>> saw a
> > >>>>>>>>> lot
> > >>>>>>>>>>> of
> > >>>>>>>>>>>>>>>> benefits.
> > >>>>>>>>>>>>>>>>> Apologies for giving you that sore throat
> > >>>> Andrew; I
> > >>>>>>>> really
> > >>>>>>>>>>> didn't
> > >>>>>>>>>>>>>> mean
> > >>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>> :-)
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> So on this issue, we have these on the plate:
> > >>>>>>>>>>>>>>>>> 0. Somehow not use MR but something like that
> > >>>>>>>>>>>>>>>>> 1. Run a standalone service other than master
> > >>>>>>>>>>>>>>>>> 2. Shell out from the master
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> I don't think we have a good answer to (0),
> > >>> and I
> > >>>>>> don't
> > >>>>>>>>> think
> > >>>>>>>>>>>> it's
> > >>>>>>>>>>>>>> even
> > >>>>>>>>>>>>>>>>> worth the effort of trying to build something
> > >>>> when
> > >>>>> MR
> > >>>>>>> is
> > >>>>>>>>>>> already
> > >>>>>>>>>>>>>> there,
> > >>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>> being used by HBase already for some
> > >>> operations.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> On (1), we have to deal with a myriad of
> > >>> issues -
> > >>>>> HA
> > >>>>>> of
> > >>>>>>>> the
> > >>>>>>>>>>>> server
> > >>>>>>>>>>>>>> not
> > >>>>>>>>>>>>>>>>> being the least of them all. Security
> > >> (kerberos
> > >>>>>>>>>> authentication,
> > >>>>>>>>>>>>>> another
> > >>>>>>>>>>>>>>>>> keytab to manage, etc. etc. etc.). IMO, that
> > >>>>> approach
> > >>>>>>> is
> > >>>>>>>>> DOA.
> > >>>>>>>>>>>>> Instead
> > >>>>>>>>>>>>>>>> let's
> > >>>>>>>>>>>>>>>>> substitute that (1) with the HBase Master. I
> > >>>>> haven't
> > >>>>>>> seen
> > >>>>>>>>> any
> > >>>>>>>>>>>> good
> > >>>>>>>>>>>>>>> reason
> > >>>>>>>>>>>>>>>>> why the HBase master shouldn't launch MR jobs
> > >>> if
> > >>>>>>> needed.
> > >>>>>>>>> It's
> > >>>>>>>>>>> not
> > >>>>>>>>>>>>>>> ideal;
> > >>>>>>>>>>>>>>>>> agreed.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Now before going to (2), let's see what are
> > >> the
> > >>>>>>> benefits
> > >>>>>>>> of
> > >>>>>>>>>>>> running
> > >>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>> backup/restore jobs from the master. I think
> > >>> Ted
> > >>>>> has
> > >>>>>>>>>> summarized
> > >>>>>>>>>>>>> some
> > >>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>> issues that we need to take care of -
> > >>> basically,
> > >>>>> the
> > >>>>>>>> master
> > >>>>>>>>>> can
> > >>>>>>>>>>>>> keep
> > >>>>>>>>>>>>>>>> track
> > >>>>>>>>>>>>>>>>> of running jobs, and should it fail, the
> > >> backup
> > >>>>>> master
> > >>>>>>>> can
> > >>>>>>>>>>>> continue
> > >>>>>>>>>>>>>>>> keeping
> > >>>>>>>>>>>>>>>>> track of it (since the jobId would have been
> > >>>>> recorded
> > >>>>>>> in
> > >>>>>>>>> the
> > >>>>>>>>>>> proc
> > >>>>>>>>>>>>>> WAL).
> > >>>>>>>>>>>>>>>> The
> > >>>>>>>>>>>>>>>>> master can also do cleanup, etc. of failed
> > >>>>>>> backup/restore
> > >>>>>>>>>>>>> processes.
> > >>>>>>>>>>>>>>>>> Security is another issue - the job needs to
> > >>> run
> > >>>> as
> > >>>>>>>> 'hbase'
> > >>>>>>>>>>> since
> > >>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>> owns
> > >>>>>>>>>>>>>>>>> the data. Having the master launch the job
> > >>> makes
> > >>>> it
> > >>>>>> get
> > >>>>>>>>> that
> > >>>>>>>>>>>>>> privilege.
> > >>>>>>>>>>>>>>>> In
> > >>>>>>>>>>>>>>>>> the (2) approach, it's hard to do some of the
> > >>>> above
> > >>>>>>>>>> management.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> Guys, just to reiterate, the patch as such is
> > >>>> ready
> > >>>>>>> from
> > >>>>>>>>> the
> > >>>>>>>>>>>>> overall
> > >>>>>>>>>>>>>>>>> design/arch point of view (maybe code review
> > >> is
> > >>>>> still
> > >>>>>>>>> pending
> > >>>>>>>>>>>> from
> > >>>>>>>>>>>>>>>> Matteo).
> > >>>>>>>>>>>>>>>>> If in the future, we find better ways of
> > >> doing
> > >>>> this
> > >>>>>>>> without
> > >>>>>>>>>>> using
> > >>>>>>>>>>>>> MR,
> > >>>>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>>>> can certainly consider that. But IMO don't
> > >>> think
> > >>>> we
> > >>>>>>>> should
> > >>>>>>>>>>> block
> > >>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>>> patch
> > >>>>>>>>>>>>>>>>> from getting merged.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> ________________________________________
> > >>>>>>>>>>>>>>>>> From: 张铎 <palomino219@gmail.com>
> > >>>>>>>>>>>>>>>>> Sent: Thursday, September 22, 2016 8:32 PM
> > >>>>>>>>>>>>>>>>> To: dev@hbase.apache.org
> > >>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs started by
> > >>>> Master
> > >>>>>> or
> > >>>>>>> RS
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> So what about a standalone service other than
> > >>>>> master?
> > >>>>>>> You
> > >>>>>>>>> can
> > >>>>>>>>>>> use
> > >>>>>>>>>>>>>> your
> > >>>>>>>>>>>>>>>> own
> > >>>>>>>>>>>>>>>>> procedure store in that service?
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> 2016-09-23 11:28 GMT+08:00 Ted Yu <
> > >>>>>> yuzhihong@gmail.com
> > >>>>>>>> :
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> An earlier implementation was client
> > >> driven.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> But with that approach, it is hard to
> > >> resume
> > >>> if
> > >>>>>> there
> > >>>>>>>> is
> > >>>>>>>>>>> error
> > >>>>>>>>>>>>>>> midway.
> > >>>>>>>>>>>>>>>>>> Using Procedure V2 makes the backup /
> > >> restore
> > >>>>> more
> > >>>>>>>>> robust.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Another consideration is for security. It
> > >> is
> > >>>> hard
> > >>>>>> to
> > >>>>>>>>>> enforce
> > >>>>>>>>>>>>>> security
> > >>>>>>>>>>>>>>>> (to
> > >>>>>>>>>>>>>>>>>> be implemented) for client driven actions.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Cheers
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 8:15 PM, Andrew
> > >>> Purtell <
> > >>>>>>>>>>>>>>>> andrew.purtell@gmail.com>
> > >>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> No, this misses Matteo's finer point,
> > >> which
> > >>>> is
> > >>>>>>>>> "shelling
> > >>>>>>>>>>> out"
> > >>>>>>>>>>>>>> from
> > >>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>> master directly to run MR is a first. Why
> > >> not
> > >>>>> drive
> > >>>>>>>> this
> > >>>>>>>>>>> with a
> > >>>>>>>>>>>>>>> utility
> > >>>>>>>>>>>>>>>>>> derived from Tool?
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 7:57 PM, Vladimir
> > >>>> Rodionov
> > >>>>> <
> > >>>>>>>>>>>>>>>> vladrodionov@gmail.com
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> In our production cluster,  it is a
> > >>> common
> > >>>>>> case
> > >>>>>>> we
> > >>>>>>>>>> just
> > >>>>>>>>>>>> have
> > >>>>>>>>>>>>>>> HDFS
> > >>>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>>>> HBase deployed.
> > >>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR
> > >> framework
> > >>>>>>>> (especially
> > >>>>>>>>>> some
> > >>>>>>>>>>>>>>> features
> > >>>>>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>>>>>>>>> have not used at all),  it introduced
> > >>>>> another
> > >>>>>>> cost
> > >>>>>>>>> for
> > >>>>>>>>>>>>>> maintain.
> > >>>>>>>>>>>>>>>> I
> > >>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> So , you are not backup users in this
> > >>> case.
> > >>>>> Many
> > >>>>>>> our
> > >>>>>>>>>>>> customers
> > >>>>>>>>>>>>>>> have
> > >>>>>>>>>>>>>>>>> full
> > >>>>>>>>>>>>>>>>>>>> stack deployed and
> > >>>>>>>>>>>>>>>>>>>> want see backup to be a standard
> > >> feature.
> > >>>>>> Besides
> > >>>>>>>>> this,
> > >>>>>>>>>>>>> nothing
> > >>>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>>> happen
> > >>>>>>>>>>>>>>>>>>>> in your cluster
> > >>>>>>>>>>>>>>>>>>>> if you won't be doing backups.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> This discussion (we do not want see M/R
> > >>>>>>> dependency)
> > >>>>>>>>> goes
> > >>>>>>>>>>> to
> > >>>>>>>>>>>>>>> nowhere.
> > >>>>>>>>>>>>>>>>> We
> > >>>>>>>>>>>>>>>>>>>> asked already, at least twice, to
> > >> suggest
> > >>>>>> another
> > >>>>>>>>>>> framework
> > >>>>>>>>>>>>>> (other
> > >>>>>>>>>>>>>>>>> than
> > >>>>>>>>>>>>>>>>>> M/R)
> > >>>>>>>>>>>>>>>>>>>> for bulk data copy with *conversion*.
> > >>> Still
> > >>>>>>> waiting
> > >>>>>>>>> for
> > >>>>>>>>>>>>>>> suggestions.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> -Vlad
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:49 PM, Ted
> > >> Yu <
> > >>>>>>>>>>>> yuzhihong@gmail.com
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> If MR framework is not deployed in the
> > >>>>> cluster,
> > >>>>>>>> hbase
> > >>>>>>>>>>> still
> > >>>>>>>>>>>>>>>> functions
> > >>>>>>>>>>>>>>>>>>>>> normally (post merge).
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> In terms of build time dependency, we
> > >>> have
> > >>>>> long
> > >>>>>>>> been
> > >>>>>>>>>>>>> depending
> > >>>>>>>>>>>>>> on
> > >>>>>>>>>>>>>>>>>>>>> mapreduce. Take a look at
> > >> ExportSnapshot.
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> Cheers
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:42 PM, Heng
> > >>> Chen
> > >>>> <
> > >>>>>>>>>>>>>>>> heng.chen.1986@gmail.com
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> In our production cluster,  it is a
> > >>> common
> > >>>>>> case
> > >>>>>>> we
> > >>>>>>>>>> just
> > >>>>>>>>>>>> have
> > >>>>>>>>>>>>>>> HDFS
> > >>>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>>>> HBase deployed.
> > >>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR
> > >> framework
> > >>>>>>>> (especially
> > >>>>>>>>>> some
> > >>>>>>>>>>>>>>> features
> > >>>>>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>>>>>>>>> have not used at all),  it introduced
> > >>>>> another
> > >>>>>>> cost
> > >>>>>>>>> for
> > >>>>>>>>>>>>>> maintain.
> > >>>>>>>>>>>>>>>> I
> > >>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:28 GMT+08:00 张铎 <
> > >>>>>>>>> palomino219@gmail.com
> > >>>>>>>>>>> :
> > >>>>>>>>>>>>>>>>>>>>>>> To be specific, for example, our nice
> > >>>>>>>>> Backup/Restore
> > >>>>>>>>>>>>> feature,
> > >>>>>>>>>>>>>>> if
> > >>>>>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>>>>>>>> think
> > >>>>>>>>>>>>>>>>>>>>>>> this is not a core feature of HBase,
> > >>> then
> > >>>>> we
> > >>>>>>>> could
> > >>>>>>>>>> make
> > >>>>>>>>>>>> it
> > >>>>>>>>>>>>>>> depend
> > >>>>>>>>>>>>>>>>> on
> > >>>>>>>>>>>>>>>>>>>>> MR,
> > >>>>>>>>>>>>>>>>>>>>>>> and start a standalone BackupManager
> > >>>>> instance
> > >>>>>>>> that
> > >>>>>>>>>>>> submits
> > >>>>>>>>>>>>> MR
> > >>>>>>>>>>>>>>>> jobs
> > >>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>> do
> > >>>>>>>>>>>>>>>>>>>>>>> periodical maintenance job. And if we
> > >>>> think
> > >>>>>>> this
> > >>>>>>>>> is a
> > >>>>>>>>>>>> core
> > >>>>>>>>>>>>>>>> feature
> > >>>>>>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>>>>>>>> everyone should use it, then we'd
> > >>> better
> > >>>>>>>> implement
> > >>>>>>>>> it
> > >>>>>>>>>>>>> without
> > >>>>>>>>>>>>>>> MR
> > >>>>>>>>>>>>>>>>>>>>>>> dependency, like DLS.
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Thanks.
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:11 GMT+08:00 张铎 <
> > >>>>>>>>> palomino219@gmail.com
> > >>>>>>>>>>> :
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> I‘m -1 on let master or rs launch MR
> > >>>> jobs.
> > >>>>>> It
> > >>>>>>> is
> > >>>>>>>>> OK
> > >>>>>>>>>>> that
> > >>>>>>>>>>>>>> some
> > >>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>> our
> > >>>>>>>>>>>>>>>>>>>>>>>> features depend on MR but I think
> > >> the
> > >>>>> bottom
> > >>>>>>>> line
> > >>>>>>>>> is
> > >>>>>>>>>>>> that
> > >>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>>>> should
> > >>>>>>>>>>>>>>>>>>>>>> launch
> > >>>>>>>>>>>>>>>>>>>>>>>> the jobs from outside manually or by
> > >>>> other
> > >>>>>>>>> services.
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 9:47 GMT+08:00 Andrew
> > >>>> Purtell <
> > >>>>>>>>>>>>>>>>> andrew.purtell@gmail.com
> > >>>>>>>>>>>>>>>>>>> :
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> Ok, got it. Well "shelling out" is
> > >> on
> > >>>> the
> > >>>>>>> line
> > >>>>>>>> I
> > >>>>>>>>>>> think,
> > >>>>>>>>>>>>> so
> > >>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>> fair
> > >>>>>>>>>>>>>>>>>>>>>>>>> question.
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> Can this be driven by a utility
> > >>> derived
> > >>>>>> from
> > >>>>>>>> Tool
> > >>>>>>>>>>> like
> > >>>>>>>>>>>>> our
> > >>>>>>>>>>>>>>>> other
> > >>>>>>>>>>>>>>>>> MR
> > >>>>>>>>>>>>>>>>>>>>>> apps?
> > >>>>>>>>>>>>>>>>>>>>>>>>> The issue is needing the
> > >>>> AccessController
> > >>>>>> to
> > >>>>>>>>> decide
> > >>>>>>>>>>> if
> > >>>>>>>>>>>>>>> allowed?
> > >>>>>>>>>>>>>>>>> But
> > >>>>>>>>>>>>>>>>>>>>>> nothing
> > >>>>>>>>>>>>>>>>>>>>>>>>> prevents the user from running the
> > >>> job
> > >>>>>>>>>>>>>>> manually/independently,
> > >>>>>>>>>>>>>>>>>> right?
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 3:44 PM,
> > >> Matteo
> > >>>>>>> Bertozzi <
> > >>>>>>>>>>>>>>>>>>>>>> theo.bertozzi@gmail.com>
> > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> just a remark. my query was not
> > >>> about
> > >>>>>> tools
> > >>>>>>>>> using
> > >>>>>>>>>> MR
> > >>>>>>>>>>>>>>>> (everyone i
> > >>>>>>>>>>>>>>>>>>>>>> think
> > >>>>>>>>>>>>>>>>>>>>>>>>> is
> > >>>>>>>>>>>>>>>>>>>>>>>>>> ok with those).
> > >>>>>>>>>>>>>>>>>>>>>>>>>> the topic was about: "are we ok
> > >> with
> > >>>>>> running
> > >>>>>>>> MR
> > >>>>>>>>>> jobs
> > >>>>>>>>>>>>> from
> > >>>>>>>>>>>>>>>> Master
> > >>>>>>>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>>>> RSs
> > >>>>>>>>>>>>>>>>>>>>>>>>>> code?" since this will be the
> > >> first
> > >>>> time
> > >>>>>> we
> > >>>>>>> do
> > >>>>>>>>>> this
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> Matteo
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM,
> > >>>>> Devaraj
> > >>>>>>> Das
> > >>>>>>>> <
> > >>>>>>>>>>>>>>>>>>>>> ddas@hortonworks.com>
> > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Very much agree; for tools like
> > >>>>>>>> ExportSnapshot
> > >>>>>>>>> /
> > >>>>>>>>>>>>> Backup /
> > >>>>>>>>>>>>>>>>>> Restore,
> > >>>>>>>>>>>>>>>>>>>>>> it's
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> fine to be dependent on MR. MR is
> > >>> the
> > >>>>>> right
> > >>>>>>>>>>> framework
> > >>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>> such.
> > >>>>>>>>>>>>>>>>>> We
> > >>>>>>>>>>>>>>>>>>>>>>>>> should
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> also do compactions using MR
> > >> (just
> > >>>>> saying
> > >>>>>>> :)
> > >>>>>>>> )
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> ______________________________
> > >>>>> __________
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> From: Ted Yu <
> > >> yuzhihong@gmail.com>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22,
> > >> 2016
> > >>>> 2:00
> > >>>>>> PM
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs
> > >>>>> started
> > >>>>>>> by
> > >>>>>>>>>> Master
> > >>>>>>>>>>>> or
> > >>>>>>>>>>>>> RS
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> I agree - backup / restore is in
> > >>> the
> > >>>>> same
> > >>>>>>>>>> category
> > >>>>>>>>>>> as
> > >>>>>>>>>>>>>>> import
> > >>>>>>>>>>>>>>>> /
> > >>>>>>>>>>>>>>>>>>>>>> export.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM,
> > >>>> Andrew
> > >>>>>>>>> Purtell <
> > >>>>>>>>>>>>>>>>>>>>>>>>> andrew.purtell@gmail.com>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Backup is extra tooling around
> > >>> core
> > >>>> in
> > >>>>>> my
> > >>>>>>>>>> opinion.
> > >>>>>>>>>>>>> Like
> > >>>>>>>>>>>>>>>> import
> > >>>>>>>>>>>>>>>>>> or
> > >>>>>>>>>>>>>>>>>>>>>>>>> export.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Or the optional MOB tool. It's
> > >>> fine.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 1:50 PM,
> > >>> Matteo
> > >>>>>>>> Bertozzi
> > >>>>>>>>> <
> > >>>>>>>>>>>>>>>>>>>>>> mbertozzi@apache.org>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> What's the latest opinion
> > >> around
> > >>>>>> running
> > >>>>>>> MR
> > >>>>>>>>>> jobs
> > >>>>>>>>>>>> from
> > >>>>>>>>>>>>>>> hbase
> > >>>>>>>>>>>>>>>>>>>>>> (Master
> > >>>>>>>>>>>>>>>>>>>>>>>>> or
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> RS)?
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I remember in the past that
> > >> there
> > >>>> was
> > >>>>>>>>>> discussion
> > >>>>>>>>>>>>> about
> > >>>>>>>>>>>>>>> not
> > >>>>>>>>>>>>>>>>>>>>> having
> > >>>>>>>>>>>>>>>>>>>>>> MR
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> has
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> direct dependency of hbase.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think some of discussion
> > >> where
> > >>>>> around
> > >>>>>>> MOB
> > >>>>>>>>>> that
> > >>>>>>>>>>>> had
> > >>>>>>>>>>>>> a
> > >>>>>>>>>>>>>> MR
> > >>>>>>>>>>>>>>>> job
> > >>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> compact,
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> that later was transformed in a
> > >>>>> non-MR
> > >>>>>>> job
> > >>>>>>>> to
> > >>>>>>>>>> be
> > >>>>>>>>>>>>>> merged,
> > >>>>>>>>>>>>>>> I
> > >>>>>>>>>>>>>>>>>> think
> > >>>>>>>>>>>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> had a
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> similar discussion for log
> > >>>>>> split/replay.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the latest is the new Backup
> > >>>> feature
> > >>>>>>>>>>> (HBASE-7912),
> > >>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>> runs
> > >>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>>> MR
> > >>>>>>>>>>>>>>>>>>>>>> job
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> from
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the master to copy data or
> > >>> restore
> > >>>>>> data.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> (backup is also "not really
> > >> core"
> > >>>> as
> > >>>>>> in..
> > >>>>>>>> if
> > >>>>>>>>>> you
> > >>>>>>>>>>>>> don't
> > >>>>>>>>>>>>>>> use
> > >>>>>>>>>>>>>>>>>>>>> backup
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> you'll
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> not end up running MR jobs, but
> > >>>> this
> > >>>>>> was
> > >>>>>>>>>> probably
> > >>>>>>>>>>>>> true
> > >>>>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>> MOB
> > >>>>>>>>>>>>>>>>>>>>> as
> > >>>>>>>>>>>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> "if
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> you don't enable MOB you don't
> > >>> need
> > >>>>>> MR")
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> any thoughts? do we a rule that
> > >>>> says
> > >>>>>> "we
> > >>>>>>>>> don't
> > >>>>>>>>>>> want
> > >>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>> have
> > >>>>>>>>>>>>>>>>>>>>> hbase
> > >>>>>>>>>>>>>>>>>>>>>> run
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> MR
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs, only tool started
> > >> manually
> > >>> by
> > >>>>> the
> > >>>>>>>> user
> > >>>>>>>>>> can
> > >>>>>>>>>>> do
> > >>>>>>>>>>>>>>> that".
> > >>>>>>>>>>>>>>>> or
> > >>>>>>>>>>>>>>>>>>>>> can
> > >>>>>>>>>>>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> start
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> adding MR calls around without
> > >>>>>> problems?
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message