hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
Date Wed, 05 Oct 2016 01:28:11 GMT
Refactoring work over in HBASE-16727 is ready for review.

Kindly provide your feedback.

Thanks

On Mon, Oct 3, 2016 at 3:05 PM, Andrew Purtell <apurtell@apache.org> wrote:

> This sounds good to me.
> I'd be at least +0 as to merging the branch as long as we are not 'shelling
> out' to MR from master.
>
> > All or most of the Backup/Restore operations (especially the MR job
> spawns) should be moved to the client.
>
> We have a home grown backup solution at Salesforce that to a first order of
> approximation is this. I would like to see something like this merged.
>
> > In the future, if someone needs to support self-service operations (any
> user can take a backup/restore his/her tables), we can discuss the "backup
> service" or something else.
>
> I can't commit the time of the team here (smile), but we always strive to
> minimize the amount of local code we need to manage HBase. For example, we
> use VerifyReplication and other tools that ship with HBase, and we have
> contributed minor operational improvements as we've developed them (like
> the region mover and canary stuff). I suspect we will have some adoption of
> this tooling and further refinement insofar it fits into a backup workflow
> at 30kft view using snapshots, replication (or file shipping), and WAL
> replay.
>
>
> On Mon, Sep 26, 2016 at 9:57 PM, Devaraj Das <ddas@hortonworks.com> wrote:
>
> > Vlad, thinking about it a little more, since the master is not
> > orchestrating the backup, let's make it dead simple as a first pass. I
> > think we should do the following: All or most of the Backup/Restore
> > operations (especially the MR job spawns) should be moved to the client.
> > Ignore security for the moment - let's live with what we have as the
> > current "limitation" for tools that need HDFS access - they need to run
> as
> > hbase (or whatever the hbase daemons runs as). Consistency/cleanup needs
> to
> > be handled as well as much as possible - if the client fails after
> > initiating the backup/restore, who restores consistency in the
> hbase:backup
> > table, or cleans up the half copied data in the hdfs dirs, etc.
> > In the future, if someone needs to support self-service operations (any
> > user can take a backup/restore his/her tables), we can discuss the
> "backup
> > service" or something else.
> > Folks - Stack / Andrew / Matteo / others, please speak up if you disagree
> > with the above. Would like to get over this merge-to-master hump
> obviously.
> >
> > ________________________________________
> > From: Vladimir Rodionov <vladrodionov@gmail.com>
> > Sent: Monday, September 26, 2016 11:48 AM
> > To: dev@hbase.apache.org
> > Subject: Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs
> > started by Master or RS)
> >
> > Ok, we had internal discussion and this is what we are suggesting now:
> >
> > 1. We will create separate module (hbase-backup) and move server-side
> code
> > there.
> > 2. Master and RS will be MR and backup free.
> > 3. The code from Master will be moved into standalone service
> > (BackupService) for procedure orchestration,
> >      operation resume/abort and SECURITY. It means - one additional
> > (process) similar to REST/Thrift server will be required
> >     to operate backup.
> >
> > I would like to note that separate process running under hbase super user
> > is required to implement security properly in a multi-tenant environment,
> > otherwise, only hbase super user will be allowed to operate backups
> >
> > Please let us know, what do you think, HBase people :?
> >
> > -Vlad
> >
> >
> >
> > On Sat, Sep 24, 2016 at 2:49 PM, Stack <stack@duboce.net> wrote:
> >
> > > On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell <
> > andrew.purtell@gmail.com>
> > > wrote:
> > >
> > > > At branch merge voting time now more eyes are getting on the design
> > > issues
> > > > with dissenting opinion emerging. This is the branch merge process
> > > working
> > > > as our community has designed it. Because this is the first full
> > project
> > > > review of the code and implementation I think we all have to be
> > > flexible. I
> > > > see the community as trying to narrow the technical objection at
> issue
> > to
> > > > the smallest possible scope. It's simple: don't call out to an
> external
> > > > execution framework we don't own from core master (and by extension
> > > > regionserver) code. We had this objection before to a proposed
> external
> > > > compaction implementation for
> > > > MOB so should not come as a surprise. Please let me know if I have
> > > > misstated this.
> > > >
> > > >
> > > The above is my understanding also.
> > >
> > >
> > > > This would seem to require a modest refactor of coordination to move
> > > > invocation of MR code out from any core code path. To restate what I
> > > think
> > > > is an emerging recommendation: Move cross HBase and MR coordination
> to
> > a
> > > > separate tool. This tool can ask the master to invoke procedures on
> the
> > > > HBase side that do first mile export and last mile restore.
> (Internally
> > > the
> > > > tool can also use the procedure framework for state durability,
> > perhaps,
> > > > just a thought.) Then the tool can further drive the things done with
> > MR
> > > > like shipping data off cluster or moving remote data in place and
> > > preparing
> > > > it for import. These activities do not need procedure coordination
> and
> > > > involvement of the HBase master. Only the first and last mile of the
> > > > process needs atomicity within the HBase deploy. Please let me know
> if
> > I
> > > > have misstated this.
> > > >
> > > >
> > > > Above is my understanding of our recommendation.
> > >
> > > St.Ack
> > >
> > >
> > >
> > > > > On Sep 24, 2016, at 8:17 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> > > > >
> > > > > bq. procedure gives you a retry mechanism on failure
> > > > >
> > > > > We do need this mechanism. Take a look at the multi-step
> > > > > in FullTableBackupProcedure, etc.
> > > > >
> > > > > bq. let the user export it later when he wants
> > > > >
> > > > > This would make supporting security more complex (user A shouldn't
> be
> > > > > exporting user B's backup). And it is not user friendly - at the
> time
> > > > > backup request is issued, the following is specified:
> > > > >
> > > > > +          + " BACKUP_ROOT     The full root path to store the
> backup
> > > > > image,\n"
> > > > > +          + "                 the prefix can be hdfs, webhdfs or
> > > gpfs\n"
> > > > >
> > > > > Backup root is an integral part of backup manifest.
> > > > >
> > > > > Cheers
> > > > >
> > > > >
> > > > > On Sat, Sep 24, 2016 at 7:59 AM, Matteo Bertozzi <
> > > > theo.bertozzi@gmail.com>
> > > > > wrote:
> > > > >
> > > > >>> On Sat, Sep 24, 2016 at 7:19 AM, Ted Yu <yuzhihong@gmail.com>
> > wrote:
> > > > >>>
> > > > >>> Ideally the export should have one job running which does the
> retry
> > > (on
> > > > >>> failed partition) itself.
> > > > >>>
> > > > >>
> > > > >> procedure gives you a retry mechanism on failure. if you don't use
> > > that,
> > > > >> than you don't need procedure.
> > > > >> if you want you can start a procedure executor in a non master
> > process
> > > > (the
> > > > >> hbase-procedure is a separate package and does not depend on
> > master).
> > > > but
> > > > >> again, export seems a case where you don't need procedure.
> > > > >>
> > > > >> like snapshot, the logic may just be: ask the master to take a
> > backup.
> > > > and
> > > > >> let the user export it later when he wants. so you avoid having a
> MR
> > > job
> > > > >> started by the master since people does not seems to like it.
> > > > >>
> > > > >> for restore (I think that is where you use the MR splitter) you
> can
> > > > >> probably just have a backup ready (already splitted). there is
> > > already a
> > > > >> jira that should do that HBASE-14135. instead of doing the
> operation
> > > of
> > > > >> split/merge on restore. you consolidate the backup "offline" (mr
> job
> > > > >> started by the user) and then ask to restore the backup.
> > > > >>
> > > > >>
> > > > >>>
> > > > >>> On Sat, Sep 24, 2016 at 7:04 AM, Matteo Bertozzi <
> > > > >> theo.bertozzi@gmail.com>
> > > > >>> wrote:
> > > > >>>
> > > > >>>> as far as I understand the code, you don't need procedure for
> the
> > > > >> export
> > > > >>>> itself.
> > > > >>>> the export operation is already idempotent, since you are just
> > > copying
> > > > >>>> files.
> > > > >>>> if the file exist and is complete (check length, checksum, ...)
> > you
> > > > can
> > > > >>>> skip it,
> > > > >>>> otherwise you'll send it over again.
> > > > >>>>
> > > > >>>> you need the proc for taking the backup and restoring,
> > > > >>>> because you want to complete the operation and end up with a
> > > > consistent
> > > > >>>> state
> > > > >>>> across the multiple components you are updating (meta, fs, ...)
> > > > >>>> but again, for export you can just run the tool over and over
> > until
> > > > the
> > > > >>>> operation succeed, and that should be ok.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Matteo
> > > > >>>>
> > > > >>>>
> > > > >>>>> On Sat, Sep 24, 2016 at 6:54 AM, Ted Yu <yuzhihong@gmail.com>
> > > wrote:
> > > > >>>>>
> > > > >>>>> Master is involved in this discussion because currently only
> > Master
> > > > >>>>> instantiates ProcedureExecutor which runs the 3 Procedures for
> > > > >> backup /
> > > > >>>>> restore.
> > > > >>>>>
> > > > >>>>> What if an optional standalone service which hosts
> > > ProcedureExecutor
> > > > >> is
> > > > >>>>> used for this purpose ?
> > > > >>>>> Would that have better chance of giving us middle ground so
> that
> > we
> > > > >> can
> > > > >>>>> move this forward ?
> > > > >>>>>
> > > > >>>>> Cheers
> > > > >>>>>
> > > > >>>>>> On Fri, Sep 23, 2016 at 5:15 PM, Stack <stack@duboce.net>
> > wrote:
> > > > >>>>>>
> > > > >>>>>> (Moved out of the Master doing MR DISCUSSION)
> > > > >>>>>>
> > > > >>>>>> On Fri, Sep 23, 2016 at 12:24 PM, Vladimir Rodionov <
> > > > >>>>>> vladrodionov@gmail.com>
> > > > >>>>>> wrote:
> > > > >>>>>>
> > > > >>>>>>>>> -1 on that backup be in core hbase
> > > > >>>>>>>
> > > > >>>>>>> Not sure I understand what it means.
> > > > >>>>>>>
> > > > >>>>>>> Sorry for the imprecision.
> > > > >>>>>>
> > > > >>>>>> The -1 is NOT against backup/restore. I am -1 on MR as a
> > > dependency
> > > > >>> and
> > > > >>>>> so
> > > > >>>>>> -1 on the Master running backup/restore MR jobs, even if
> > optional.
> > > > >>>>>>
> > > > >>>>>> Master should not depend on MR. We've gone out of our way to
> > avoid
> > > > >>>> taking
> > > > >>>>>> MR on as dependency in the past. Seems late in the game for us
> > to
> > > > >>>> change
> > > > >>>>>> our opinion on this. If we didn't do it for distributed log
> > > > >>> splitting,
> > > > >>>> or
> > > > >>>>>> MOB, why would we do it to support an optional backup/restore?
> > > > >>>>>>
> > > > >>>>>> I have opinions on the questions below -- i.e. that Master
> > running
> > > > >>>>>> backup/restore is outside of the Master's charge -- but they
> are
> > > > >> not
> > > > >>>>> worth
> > > > >>>>>> much since I've not done much by way of review or contrib to
> > > > >>>>> backup/restore
> > > > >>>>>> other than to try it as a 'user' so I'll keep them to myself
> > until
> > > > >> I
> > > > >>>> do.
> > > > >>>>> I
> > > > >>>>>> only came out from under my shell to participate on the MR as
> > > > >>>> dependency
> > > > >>>>>> chat.
> > > > >>>>>>
> > > > >>>>>> Thanks,
> > > > >>>>>> M
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> 1. We are not allowed to use Master to orchestrate the whole
> > > > >> process?
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> We
> > > > >>>>>>> have already brought up all advantages of using
> > > > >>>>>>>   Master and distributed procedures for backup and restore.
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> Downside of moving this to client tool is lack of fault
> > > > >> tolerance:
> > > > >>>>>>> 1.1 Client won't be allowed to do any operations, that can,
> > > > >>>>> potentially
> > > > >>>>>>> affect
> > > > >>>>>>> cluster, such as disabling splits/merges, balancer.
> > > > >>>>>>> 1.2 In case of client failure who will be doing the whole
> > > > >> rollback
> > > > >>>>>> stuff?
> > > > >>>>>>> We are trying to make it atomic.
> > > > >>>>>>>
> > > > >>>>>>> Security is not clear.
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> 2. We are not allowed to modify code of existing HBase core
> > > classes
> > > > >>>> (what
> > > > >>>>>>> does core mean anyway)?
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>> 3. We are not allowed to create backup system table
> > > > >> (hbase:backup)
> > > > >>>> in a
> > > > >>>>>>> system space? Only in user space? The table is global.
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>> 2. is critical. Despite the fact, that 95% of code is new, we
> > > > >> have
> > > > >>>>>> touched,
> > > > >>>>>>> of course some existing HBase code.
> > > > >>>>>>> 3. is not that critical, of course we can move backup system
> > into
> > > > >>>> user
> > > > >>>>>>> space.
> > > > >>>>>>>
> > > > >>>>>>> And finally, will moving backup into external tool give us +1
> > > > >> from
> > > > >>>>> stack?
> > > > >>>>>>>
> > > > >>>>>>> -Vlad
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> On Fri, Sep 23, 2016 at 11:26 AM, Stack <stack@duboce.net>
> > > > >> wrote:
> > > > >>>>>>>
> > > > >>>>>>>> On Fri, Sep 23, 2016 at 11:22 AM, Vladimir Rodionov <
> > > > >>>>>>>> vladrodionov@gmail.com>
> > > > >>>>>>>> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>>>> + MR is dead
> > > > >>>>>>>>>
> > > > >>>>>>>>> Does MR know that? :)
> > > > >>>>>>>>>
> > > > >>>>>>>>> Again. With all due respect, stack - still no suggestions
> > > > >> what
> > > > >>>>> should
> > > > >>>>>>> we
> > > > >>>>>>>>> use for "bulk data move and transformation" instead of MR?
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> Use whatever distributed engine suits your fancy -- MR,
> Spark,
> > > > >>>>>>> distributed
> > > > >>>>>>>> shell -- just don't have HBase core depend on it, even
> > > > >>> optionally.
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>> I suggest voting first on "do we need backup in HBase"? In
> my
> > > > >>>>>> opinion,
> > > > >>>>>>>> some
> > > > >>>>>>>>> group members still not sure about that and some will give
> -1
> > > > >>>>>>>>> in any case. Just because ...
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>> We could run a vote, sure. -1 on that backup be in core
> hbase
> > > > >> (+1
> > > > >>>> on
> > > > >>>>>>> adding
> > > > >>>>>>>> all the API any such external tool might need to run).
> > > > >>>>>>>>
> > > > >>>>>>>> St.Ack
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>> -Vlad
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>> On Fri, Sep 23, 2016 at 10:57 AM, Stack <stack@duboce.net>
> > > > >>>> wrote:
> > > > >>>>>>>>>
> > > > >>>>>>>>>> On Fri, Sep 23, 2016 at 6:46 AM, Matteo Bertozzi <
> > > > >>>>>>>>> theo.bertozzi@gmail.com>
> > > > >>>>>>>>>> wrote:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>> let me try to go back to my original topic.
> > > > >>>>>>>>>>> this question was meant to be generic, and provide some
> > > > >>> rule
> > > > >>>>> for
> > > > >>>>>>>> future
> > > > >>>>>>>>>>> code.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> from what I can gather, a rule that may satisfy everyone
> > > > >>> can
> > > > >>>>> be:
> > > > >>>>>>>>>>> - we don't want any core feature (e.g.
> > > > >>>>> compaction/log-split/log-
> > > > >>>>>>>>> reply)
> > > > >>>>>>>>>>> over MR, because some cluster may not want or may have an
> > > > >>>>>>>>>>> external/uncontrolled MR setup.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> +1
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>> - we allow non-core features (e.g. features enabled by a
> > > > >>>> flag)
> > > > >>>>>> to
> > > > >>>>>>>> run
> > > > >>>>>>>>> MR
> > > > >>>>>>>>>>> jobs from hbase, because unless you use the feature, MR
> > > > >> is
> > > > >>>> not
> > > > >>>>>>>>> required.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>> -1 to hbase core depending on MR or core -- whether behind
> > > > >> a
> > > > >>>> flag
> > > > >>>>>> or
> > > > >>>>>>>> not
> > > > >>>>>>>>> --
> > > > >>>>>>>>>> ever being able to launch MR jobs.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> + MR is dead. We should be busy working hard to undo it
> > > > >> from
> > > > >>>>>>>> hbase-server
> > > > >>>>>>>>>> moving it out to be an optional module (Spark would be its
> > > > >>>> peer).
> > > > >>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and
> Appy
> > > > >>> are
> > > > >>>>>> busy
> > > > >>>>>>>>>> working hard on moving it up on to a new foundation. Lets
> > > > >> not
> > > > >>>>>> clutter
> > > > >>>>>>>>> task
> > > > >>>>>>>>>> harder by piling on more moving parts.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> St.Ack
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>> Matteo
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu <
> > > > >>> yuzhihong@gmail.com
> > > > >>>>>
> > > > >>>>>>> wrote:
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>> I suggest you look at Matteo's work for
> > > > >> AssignmentManager
> > > > >>>>> which
> > > > >>>>>>> is
> > > > >>>>>>>> to
> > > > >>>>>>>>>>> make
> > > > >>>>>>>>>>>> Master more stable.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Cheers
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 <
> > > > >>> palomino219@gmail.com
> > > > >>>>>
> > > > >>>>>>> wrote:
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>> No, not your fault, at lease, not this time:)
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the
> > > > >>>>> sequence
> > > > >>>>>>> of
> > > > >>>>>>>>>> calls
> > > > >>>>>>>>>>>> when
> > > > >>>>>>>>>>>>> starting up the HMaster? HMaster is also a
> > > > >> regionserver
> > > > >>>> so
> > > > >>>>> it
> > > > >>>>>>>>> extends
> > > > >>>>>>>>>>>>> HRegionServer, and the initialization of
> > > > >> HRegionServer
> > > > >>>>>>> sometimes
> > > > >>>>>>>>>> needs
> > > > >>>>>>>>>>> to
> > > > >>>>>>>>>>>>> make rpc calls to HMaster. A simple change would
> > > > >> cause
> > > > >>>>>>>>> probabilistic
> > > > >>>>>>>>>>> dead
> > > > >>>>>>>>>>>>> lock or some strange NPEs...
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> That's why I'm very nervous when somebody wants to
> > > > >> add
> > > > >>>> new
> > > > >>>>>>>> features
> > > > >>>>>>>>>> or
> > > > >>>>>>>>>>>> add
> > > > >>>>>>>>>>>>> external dependencies to HMaster, especially add more
> > > > >>>> works
> > > > >>>>>> for
> > > > >>>>>>>> the
> > > > >>>>>>>>>>> start
> > > > >>>>>>>>>>>>> up processing...
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Thanks.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> 2016-09-23 20:02 GMT+08:00 Ted Yu <
> > > > >> yuzhihong@gmail.com
> > > > >>>> :
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> I read through HADOOP-13433
> > > > >>>>>>>>>>>>>> <https://issues.apache.org/
> > > > >> jira/browse/HADOOP-13433>
> > > > >>> -
> > > > >>>>> the
> > > > >>>>>>>> cited
> > > > >>>>>>>>>>> race
> > > > >>>>>>>>>>>>>> condition is in jdk.
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> Suggest pinging the reviewer on JIRA to get it
> > > > >>> moving.
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> bq. But the ugly code in HMaster is readlly a
> > > > >>>> problem...
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> Can you be specific as to which code is ugly ? Is
> > > > >> it
> > > > >>> in
> > > > >>>>> the
> > > > >>>>>>>>> backup
> > > > >>>>>>>>>> /
> > > > >>>>>>>>>>>>>> restore mega patch ?
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> Cheers
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 10:44 PM, 张铎 <
> > > > >>>>>> palomino219@gmail.com>
> > > > >>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> If you guys have already implemented the feature
> > > > >> in
> > > > >>>> the
> > > > >>>>>> MR
> > > > >>>>>>>> way
> > > > >>>>>>>>>> and
> > > > >>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>> patch is ready for landing on master, I'm a -0 on
> > > > >>> it
> > > > >>>>> as I
> > > > >>>>>>> do
> > > > >>>>>>>>> not
> > > > >>>>>>>>>>> want
> > > > >>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>> block the development progress.
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> But I strongly suggest later we need to revisit
> > > > >> the
> > > > >>>>>> design
> > > > >>>>>>>> and
> > > > >>>>>>>>>> see
> > > > >>>>>>>>>>> if
> > > > >>>>>>>>>>>>> we
> > > > >>>>>>>>>>>>>>> can seperated the logic from HMaster as much as
> > > > >>>>> possible.
> > > > >>>>>>> HA
> > > > >>>>>>>> is
> > > > >>>>>>>>>>> not a
> > > > >>>>>>>>>>>>> big
> > > > >>>>>>>>>>>>>>> problem if you do not store any metada locally.
> > > > >> But
> > > > >>>> the
> > > > >>>>>>> ugly
> > > > >>>>>>>>> code
> > > > >>>>>>>>>>> in
> > > > >>>>>>>>>>>>>>> HMaster is readlly a problem...
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> And for security, I have a issue pending for a
> > > > >> long
> > > > >>>>> time.
> > > > >>>>>>> Can
> > > > >>>>>>>>>>> someone
> > > > >>>>>>>>>>>>>> help
> > > > >>>>>>>>>>>>>>> taking a simple look at it? This is what I mean,
> > > > >>> ugly
> > > > >>>>>>> code...
> > > > >>>>>>>>>>> logout
> > > > >>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>> destroy the credentials in a subject when it is
> > > > >>> still
> > > > >>>>>> being
> > > > >>>>>>>>> used,
> > > > >>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>> declared as LimitPrivacy so I can not change the
> > > > >>>>> behivor
> > > > >>>>>>> and
> > > > >>>>>>>>> the
> > > > >>>>>>>>>>> only
> > > > >>>>>>>>>>>>> way
> > > > >>>>>>>>>>>>>>> to fix it is to write another piece of ugly
> > > > >> code...
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> https://issues.apache.org/
> > > > >> jira/browse/HADOOP-13433
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov <
> > > > >>>>>>>>>>> vladrodionov@gmail.com
> > > > >>>>>>>>>>>>> :
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> If in the future, we find better ways of
> > > > >> doing
> > > > >>>>> this
> > > > >>>>>>>>> without
> > > > >>>>>>>>>>>> using
> > > > >>>>>>>>>>>>>> MR,
> > > > >>>>>>>>>>>>>>> we
> > > > >>>>>>>>>>>>>>>> can certainly consider that
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> Our framework for distributed operations is
> > > > >>>> abstract
> > > > >>>>>> and
> > > > >>>>>>>>> allows
> > > > >>>>>>>>>>>>>>>> different implementations. MR is just one
> > > > >>>>>> implementation
> > > > >>>>>>> we
> > > > >>>>>>>>>>>> provide.
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> -Vlad
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das <
> > > > >>>>>>>>>>> ddas@hortonworks.com
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> Guys, first off apologies for bringing in the
> > > > >>>> topic
> > > > >>>>>> of
> > > > >>>>>>>>>> MR-based
> > > > >>>>>>>>>>>>>>>>> compactions.. But I was thinking more about
> > > > >> the
> > > > >>>>>>>>> SpliceMachine
> > > > >>>>>>>>>>>>>> approach
> > > > >>>>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>>> managing compactions in Spark where
> > > > >> apparently
> > > > >>>> they
> > > > >>>>>>> saw a
> > > > >>>>>>>>> lot
> > > > >>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>> benefits.
> > > > >>>>>>>>>>>>>>>>> Apologies for giving you that sore throat
> > > > >>>> Andrew; I
> > > > >>>>>>>> really
> > > > >>>>>>>>>>> didn't
> > > > >>>>>>>>>>>>>> mean
> > > > >>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>> :-)
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> So on this issue, we have these on the plate:
> > > > >>>>>>>>>>>>>>>>> 0. Somehow not use MR but something like that
> > > > >>>>>>>>>>>>>>>>> 1. Run a standalone service other than master
> > > > >>>>>>>>>>>>>>>>> 2. Shell out from the master
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> I don't think we have a good answer to (0),
> > > > >>> and I
> > > > >>>>>> don't
> > > > >>>>>>>>> think
> > > > >>>>>>>>>>>> it's
> > > > >>>>>>>>>>>>>> even
> > > > >>>>>>>>>>>>>>>>> worth the effort of trying to build something
> > > > >>>> when
> > > > >>>>> MR
> > > > >>>>>>> is
> > > > >>>>>>>>>>> already
> > > > >>>>>>>>>>>>>> there,
> > > > >>>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>> being used by HBase already for some
> > > > >>> operations.
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> On (1), we have to deal with a myriad of
> > > > >>> issues -
> > > > >>>>> HA
> > > > >>>>>> of
> > > > >>>>>>>> the
> > > > >>>>>>>>>>>> server
> > > > >>>>>>>>>>>>>> not
> > > > >>>>>>>>>>>>>>>>> being the least of them all. Security
> > > > >> (kerberos
> > > > >>>>>>>>>> authentication,
> > > > >>>>>>>>>>>>>> another
> > > > >>>>>>>>>>>>>>>>> keytab to manage, etc. etc. etc.). IMO, that
> > > > >>>>> approach
> > > > >>>>>>> is
> > > > >>>>>>>>> DOA.
> > > > >>>>>>>>>>>>> Instead
> > > > >>>>>>>>>>>>>>>> let's
> > > > >>>>>>>>>>>>>>>>> substitute that (1) with the HBase Master. I
> > > > >>>>> haven't
> > > > >>>>>>> seen
> > > > >>>>>>>>> any
> > > > >>>>>>>>>>>> good
> > > > >>>>>>>>>>>>>>> reason
> > > > >>>>>>>>>>>>>>>>> why the HBase master shouldn't launch MR jobs
> > > > >>> if
> > > > >>>>>>> needed.
> > > > >>>>>>>>> It's
> > > > >>>>>>>>>>> not
> > > > >>>>>>>>>>>>>>> ideal;
> > > > >>>>>>>>>>>>>>>>> agreed.
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> Now before going to (2), let's see what are
> > > > >> the
> > > > >>>>>>> benefits
> > > > >>>>>>>> of
> > > > >>>>>>>>>>>> running
> > > > >>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>> backup/restore jobs from the master. I think
> > > > >>> Ted
> > > > >>>>> has
> > > > >>>>>>>>>> summarized
> > > > >>>>>>>>>>>>> some
> > > > >>>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>> issues that we need to take care of -
> > > > >>> basically,
> > > > >>>>> the
> > > > >>>>>>>> master
> > > > >>>>>>>>>> can
> > > > >>>>>>>>>>>>> keep
> > > > >>>>>>>>>>>>>>>> track
> > > > >>>>>>>>>>>>>>>>> of running jobs, and should it fail, the
> > > > >> backup
> > > > >>>>>> master
> > > > >>>>>>>> can
> > > > >>>>>>>>>>>> continue
> > > > >>>>>>>>>>>>>>>> keeping
> > > > >>>>>>>>>>>>>>>>> track of it (since the jobId would have been
> > > > >>>>> recorded
> > > > >>>>>>> in
> > > > >>>>>>>>> the
> > > > >>>>>>>>>>> proc
> > > > >>>>>>>>>>>>>> WAL).
> > > > >>>>>>>>>>>>>>>> The
> > > > >>>>>>>>>>>>>>>>> master can also do cleanup, etc. of failed
> > > > >>>>>>> backup/restore
> > > > >>>>>>>>>>>>> processes.
> > > > >>>>>>>>>>>>>>>>> Security is another issue - the job needs to
> > > > >>> run
> > > > >>>> as
> > > > >>>>>>>> 'hbase'
> > > > >>>>>>>>>>> since
> > > > >>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>> owns
> > > > >>>>>>>>>>>>>>>>> the data. Having the master launch the job
> > > > >>> makes
> > > > >>>> it
> > > > >>>>>> get
> > > > >>>>>>>>> that
> > > > >>>>>>>>>>>>>> privilege.
> > > > >>>>>>>>>>>>>>>> In
> > > > >>>>>>>>>>>>>>>>> the (2) approach, it's hard to do some of the
> > > > >>>> above
> > > > >>>>>>>>>> management.
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> Guys, just to reiterate, the patch as such is
> > > > >>>> ready
> > > > >>>>>>> from
> > > > >>>>>>>>> the
> > > > >>>>>>>>>>>>> overall
> > > > >>>>>>>>>>>>>>>>> design/arch point of view (maybe code review
> > > > >> is
> > > > >>>>> still
> > > > >>>>>>>>> pending
> > > > >>>>>>>>>>>> from
> > > > >>>>>>>>>>>>>>>> Matteo).
> > > > >>>>>>>>>>>>>>>>> If in the future, we find better ways of
> > > > >> doing
> > > > >>>> this
> > > > >>>>>>>> without
> > > > >>>>>>>>>>> using
> > > > >>>>>>>>>>>>> MR,
> > > > >>>>>>>>>>>>>>> we
> > > > >>>>>>>>>>>>>>>>> can certainly consider that. But IMO don't
> > > > >>> think
> > > > >>>> we
> > > > >>>>>>>> should
> > > > >>>>>>>>>>> block
> > > > >>>>>>>>>>>>> this
> > > > >>>>>>>>>>>>>>>> patch
> > > > >>>>>>>>>>>>>>>>> from getting merged.
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> ________________________________________
> > > > >>>>>>>>>>>>>>>>> From: 张铎 <palomino219@gmail.com>
> > > > >>>>>>>>>>>>>>>>> Sent: Thursday, September 22, 2016 8:32 PM
> > > > >>>>>>>>>>>>>>>>> To: dev@hbase.apache.org
> > > > >>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs started by
> > > > >>>> Master
> > > > >>>>>> or
> > > > >>>>>>> RS
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> So what about a standalone service other than
> > > > >>>>> master?
> > > > >>>>>>> You
> > > > >>>>>>>>> can
> > > > >>>>>>>>>>> use
> > > > >>>>>>>>>>>>>> your
> > > > >>>>>>>>>>>>>>>> own
> > > > >>>>>>>>>>>>>>>>> procedure store in that service?
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> 2016-09-23 11:28 GMT+08:00 Ted Yu <
> > > > >>>>>> yuzhihong@gmail.com
> > > > >>>>>>>> :
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> An earlier implementation was client
> > > > >> driven.
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> But with that approach, it is hard to
> > > > >> resume
> > > > >>> if
> > > > >>>>>> there
> > > > >>>>>>>> is
> > > > >>>>>>>>>>> error
> > > > >>>>>>>>>>>>>>> midway.
> > > > >>>>>>>>>>>>>>>>>> Using Procedure V2 makes the backup /
> > > > >> restore
> > > > >>>>> more
> > > > >>>>>>>>> robust.
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> Another consideration is for security. It
> > > > >> is
> > > > >>>> hard
> > > > >>>>>> to
> > > > >>>>>>>>>> enforce
> > > > >>>>>>>>>>>>>> security
> > > > >>>>>>>>>>>>>>>> (to
> > > > >>>>>>>>>>>>>>>>>> be implemented) for client driven actions.
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> Cheers
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 8:15 PM, Andrew
> > > > >>> Purtell <
> > > > >>>>>>>>>>>>>>>> andrew.purtell@gmail.com>
> > > > >>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> No, this misses Matteo's finer point,
> > > > >> which
> > > > >>>> is
> > > > >>>>>>>>> "shelling
> > > > >>>>>>>>>>> out"
> > > > >>>>>>>>>>>>>> from
> > > > >>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>> master directly to run MR is a first. Why
> > > > >> not
> > > > >>>>> drive
> > > > >>>>>>>> this
> > > > >>>>>>>>>>> with a
> > > > >>>>>>>>>>>>>>> utility
> > > > >>>>>>>>>>>>>>>>>> derived from Tool?
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 7:57 PM, Vladimir
> > > > >>>> Rodionov
> > > > >>>>> <
> > > > >>>>>>>>>>>>>>>> vladrodionov@gmail.com
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> In our production cluster,  it is a
> > > > >>> common
> > > > >>>>>> case
> > > > >>>>>>> we
> > > > >>>>>>>>>> just
> > > > >>>>>>>>>>>> have
> > > > >>>>>>>>>>>>>>> HDFS
> > > > >>>>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>>>>>> HBase deployed.
> > > > >>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR
> > > > >> framework
> > > > >>>>>>>> (especially
> > > > >>>>>>>>>> some
> > > > >>>>>>>>>>>>>>> features
> > > > >>>>>>>>>>>>>>>> we
> > > > >>>>>>>>>>>>>>>>>>>>>> have not used at all),  it introduced
> > > > >>>>> another
> > > > >>>>>>> cost
> > > > >>>>>>>>> for
> > > > >>>>>>>>>>>>>> maintain.
> > > > >>>>>>>>>>>>>>>> I
> > > > >>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea.
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> So , you are not backup users in this
> > > > >>> case.
> > > > >>>>> Many
> > > > >>>>>>> our
> > > > >>>>>>>>>>>> customers
> > > > >>>>>>>>>>>>>>> have
> > > > >>>>>>>>>>>>>>>>> full
> > > > >>>>>>>>>>>>>>>>>>>> stack deployed and
> > > > >>>>>>>>>>>>>>>>>>>> want see backup to be a standard
> > > > >> feature.
> > > > >>>>>> Besides
> > > > >>>>>>>>> this,
> > > > >>>>>>>>>>>>> nothing
> > > > >>>>>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>>>>> happen
> > > > >>>>>>>>>>>>>>>>>>>> in your cluster
> > > > >>>>>>>>>>>>>>>>>>>> if you won't be doing backups.
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> This discussion (we do not want see M/R
> > > > >>>>>>> dependency)
> > > > >>>>>>>>> goes
> > > > >>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>> nowhere.
> > > > >>>>>>>>>>>>>>>>> We
> > > > >>>>>>>>>>>>>>>>>>>> asked already, at least twice, to
> > > > >> suggest
> > > > >>>>>> another
> > > > >>>>>>>>>>> framework
> > > > >>>>>>>>>>>>>> (other
> > > > >>>>>>>>>>>>>>>>> than
> > > > >>>>>>>>>>>>>>>>>> M/R)
> > > > >>>>>>>>>>>>>>>>>>>> for bulk data copy with *conversion*.
> > > > >>> Still
> > > > >>>>>>> waiting
> > > > >>>>>>>>> for
> > > > >>>>>>>>>>>>>>> suggestions.
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> -Vlad
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:49 PM, Ted
> > > > >> Yu <
> > > > >>>>>>>>>>>> yuzhihong@gmail.com
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> If MR framework is not deployed in the
> > > > >>>>> cluster,
> > > > >>>>>>>> hbase
> > > > >>>>>>>>>>> still
> > > > >>>>>>>>>>>>>>>> functions
> > > > >>>>>>>>>>>>>>>>>>>>> normally (post merge).
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> In terms of build time dependency, we
> > > > >>> have
> > > > >>>>> long
> > > > >>>>>>>> been
> > > > >>>>>>>>>>>>> depending
> > > > >>>>>>>>>>>>>> on
> > > > >>>>>>>>>>>>>>>>>>>>> mapreduce. Take a look at
> > > > >> ExportSnapshot.
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> Cheers
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:42 PM, Heng
> > > > >>> Chen
> > > > >>>> <
> > > > >>>>>>>>>>>>>>>> heng.chen.1986@gmail.com
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> In our production cluster,  it is a
> > > > >>> common
> > > > >>>>>> case
> > > > >>>>>>> we
> > > > >>>>>>>>>> just
> > > > >>>>>>>>>>>> have
> > > > >>>>>>>>>>>>>>> HDFS
> > > > >>>>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>>>>>> HBase deployed.
> > > > >>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR
> > > > >> framework
> > > > >>>>>>>> (especially
> > > > >>>>>>>>>> some
> > > > >>>>>>>>>>>>>>> features
> > > > >>>>>>>>>>>>>>>> we
> > > > >>>>>>>>>>>>>>>>>>>>>> have not used at all),  it introduced
> > > > >>>>> another
> > > > >>>>>>> cost
> > > > >>>>>>>>> for
> > > > >>>>>>>>>>>>>> maintain.
> > > > >>>>>>>>>>>>>>>> I
> > > > >>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea.
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:28 GMT+08:00 张铎 <
> > > > >>>>>>>>> palomino219@gmail.com
> > > > >>>>>>>>>>> :
> > > > >>>>>>>>>>>>>>>>>>>>>>> To be specific, for example, our nice
> > > > >>>>>>>>> Backup/Restore
> > > > >>>>>>>>>>>>> feature,
> > > > >>>>>>>>>>>>>>> if
> > > > >>>>>>>>>>>>>>>> we
> > > > >>>>>>>>>>>>>>>>>>>>> think
> > > > >>>>>>>>>>>>>>>>>>>>>>> this is not a core feature of HBase,
> > > > >>> then
> > > > >>>>> we
> > > > >>>>>>>> could
> > > > >>>>>>>>>> make
> > > > >>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>> depend
> > > > >>>>>>>>>>>>>>>>> on
> > > > >>>>>>>>>>>>>>>>>>>>> MR,
> > > > >>>>>>>>>>>>>>>>>>>>>>> and start a standalone BackupManager
> > > > >>>>> instance
> > > > >>>>>>>> that
> > > > >>>>>>>>>>>> submits
> > > > >>>>>>>>>>>>> MR
> > > > >>>>>>>>>>>>>>>> jobs
> > > > >>>>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>>>>>> do
> > > > >>>>>>>>>>>>>>>>>>>>>>> periodical maintenance job. And if we
> > > > >>>> think
> > > > >>>>>>> this
> > > > >>>>>>>>> is a
> > > > >>>>>>>>>>>> core
> > > > >>>>>>>>>>>>>>>> feature
> > > > >>>>>>>>>>>>>>>>>> that
> > > > >>>>>>>>>>>>>>>>>>>>>>> everyone should use it, then we'd
> > > > >>> better
> > > > >>>>>>>> implement
> > > > >>>>>>>>> it
> > > > >>>>>>>>>>>>> without
> > > > >>>>>>>>>>>>>>> MR
> > > > >>>>>>>>>>>>>>>>>>>>>>> dependency, like DLS.
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> Thanks.
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:11 GMT+08:00 张铎 <
> > > > >>>>>>>>> palomino219@gmail.com
> > > > >>>>>>>>>>> :
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>> I‘m -1 on let master or rs launch MR
> > > > >>>> jobs.
> > > > >>>>>> It
> > > > >>>>>>> is
> > > > >>>>>>>>> OK
> > > > >>>>>>>>>>> that
> > > > >>>>>>>>>>>>>> some
> > > > >>>>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>>> our
> > > > >>>>>>>>>>>>>>>>>>>>>>>> features depend on MR but I think
> > > > >> the
> > > > >>>>> bottom
> > > > >>>>>>>> line
> > > > >>>>>>>>> is
> > > > >>>>>>>>>>>> that
> > > > >>>>>>>>>>>>> we
> > > > >>>>>>>>>>>>>>>>> should
> > > > >>>>>>>>>>>>>>>>>>>>>> launch
> > > > >>>>>>>>>>>>>>>>>>>>>>>> the jobs from outside manually or by
> > > > >>>> other
> > > > >>>>>>>>> services.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 9:47 GMT+08:00 Andrew
> > > > >>>> Purtell <
> > > > >>>>>>>>>>>>>>>>> andrew.purtell@gmail.com
> > > > >>>>>>>>>>>>>>>>>>> :
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> Ok, got it. Well "shelling out" is
> > > > >> on
> > > > >>>> the
> > > > >>>>>>> line
> > > > >>>>>>>> I
> > > > >>>>>>>>>>> think,
> > > > >>>>>>>>>>>>> so
> > > > >>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>> fair
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> question.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> Can this be driven by a utility
> > > > >>> derived
> > > > >>>>>> from
> > > > >>>>>>>> Tool
> > > > >>>>>>>>>>> like
> > > > >>>>>>>>>>>>> our
> > > > >>>>>>>>>>>>>>>> other
> > > > >>>>>>>>>>>>>>>>> MR
> > > > >>>>>>>>>>>>>>>>>>>>>> apps?
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> The issue is needing the
> > > > >>>> AccessController
> > > > >>>>>> to
> > > > >>>>>>>>> decide
> > > > >>>>>>>>>>> if
> > > > >>>>>>>>>>>>>>> allowed?
> > > > >>>>>>>>>>>>>>>>> But
> > > > >>>>>>>>>>>>>>>>>>>>>> nothing
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> prevents the user from running the
> > > > >>> job
> > > > >>>>>>>>>>>>>>> manually/independently,
> > > > >>>>>>>>>>>>>>>>>> right?
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 3:44 PM,
> > > > >> Matteo
> > > > >>>>>>> Bertozzi <
> > > > >>>>>>>>>>>>>>>>>>>>>> theo.bertozzi@gmail.com>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> just a remark. my query was not
> > > > >>> about
> > > > >>>>>> tools
> > > > >>>>>>>>> using
> > > > >>>>>>>>>> MR
> > > > >>>>>>>>>>>>>>>> (everyone i
> > > > >>>>>>>>>>>>>>>>>>>>>> think
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> is
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> ok with those).
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> the topic was about: "are we ok
> > > > >> with
> > > > >>>>>> running
> > > > >>>>>>>> MR
> > > > >>>>>>>>>> jobs
> > > > >>>>>>>>>>>>> from
> > > > >>>>>>>>>>>>>>>> Master
> > > > >>>>>>>>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>>>>>> RSs
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> code?" since this will be the
> > > > >> first
> > > > >>>> time
> > > > >>>>>> we
> > > > >>>>>>> do
> > > > >>>>>>>>>> this
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Matteo
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM,
> > > > >>>>> Devaraj
> > > > >>>>>>> Das
> > > > >>>>>>>> <
> > > > >>>>>>>>>>>>>>>>>>>>> ddas@hortonworks.com>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Very much agree; for tools like
> > > > >>>>>>>> ExportSnapshot
> > > > >>>>>>>>> /
> > > > >>>>>>>>>>>>> Backup /
> > > > >>>>>>>>>>>>>>>>>> Restore,
> > > > >>>>>>>>>>>>>>>>>>>>>> it's
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> fine to be dependent on MR. MR is
> > > > >>> the
> > > > >>>>>> right
> > > > >>>>>>>>>>> framework
> > > > >>>>>>>>>>>>> for
> > > > >>>>>>>>>>>>>>>> such.
> > > > >>>>>>>>>>>>>>>>>> We
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> should
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> also do compactions using MR
> > > > >> (just
> > > > >>>>> saying
> > > > >>>>>>> :)
> > > > >>>>>>>> )
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> ______________________________
> > > > >>>>> __________
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> From: Ted Yu <
> > > > >> yuzhihong@gmail.com>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22,
> > > > >> 2016
> > > > >>>> 2:00
> > > > >>>>>> PM
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs
> > > > >>>>> started
> > > > >>>>>>> by
> > > > >>>>>>>>>> Master
> > > > >>>>>>>>>>>> or
> > > > >>>>>>>>>>>>> RS
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> I agree - backup / restore is in
> > > > >>> the
> > > > >>>>> same
> > > > >>>>>>>>>> category
> > > > >>>>>>>>>>> as
> > > > >>>>>>>>>>>>>>> import
> > > > >>>>>>>>>>>>>>>> /
> > > > >>>>>>>>>>>>>>>>>>>>>> export.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM,
> > > > >>>> Andrew
> > > > >>>>>>>>> Purtell <
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> andrew.purtell@gmail.com>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Backup is extra tooling around
> > > > >>> core
> > > > >>>> in
> > > > >>>>>> my
> > > > >>>>>>>>>> opinion.
> > > > >>>>>>>>>>>>> Like
> > > > >>>>>>>>>>>>>>>> import
> > > > >>>>>>>>>>>>>>>>>> or
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> export.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Or the optional MOB tool. It's
> > > > >>> fine.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 1:50 PM,
> > > > >>> Matteo
> > > > >>>>>>>> Bertozzi
> > > > >>>>>>>>> <
> > > > >>>>>>>>>>>>>>>>>>>>>> mbertozzi@apache.org>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> What's the latest opinion
> > > > >> around
> > > > >>>>>> running
> > > > >>>>>>> MR
> > > > >>>>>>>>>> jobs
> > > > >>>>>>>>>>>> from
> > > > >>>>>>>>>>>>>>> hbase
> > > > >>>>>>>>>>>>>>>>>>>>>> (Master
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> or
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> RS)?
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I remember in the past that
> > > > >> there
> > > > >>>> was
> > > > >>>>>>>>>> discussion
> > > > >>>>>>>>>>>>> about
> > > > >>>>>>>>>>>>>>> not
> > > > >>>>>>>>>>>>>>>>>>>>> having
> > > > >>>>>>>>>>>>>>>>>>>>>> MR
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> has
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> direct dependency of hbase.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think some of discussion
> > > > >> where
> > > > >>>>> around
> > > > >>>>>>> MOB
> > > > >>>>>>>>>> that
> > > > >>>>>>>>>>>> had
> > > > >>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>> MR
> > > > >>>>>>>>>>>>>>>> job
> > > > >>>>>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> compact,
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> that later was transformed in a
> > > > >>>>> non-MR
> > > > >>>>>>> job
> > > > >>>>>>>> to
> > > > >>>>>>>>>> be
> > > > >>>>>>>>>>>>>> merged,
> > > > >>>>>>>>>>>>>>> I
> > > > >>>>>>>>>>>>>>>>>> think
> > > > >>>>>>>>>>>>>>>>>>>>>> we
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> had a
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> similar discussion for log
> > > > >>>>>> split/replay.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the latest is the new Backup
> > > > >>>> feature
> > > > >>>>>>>>>>> (HBASE-7912),
> > > > >>>>>>>>>>>>> that
> > > > >>>>>>>>>>>>>>>> runs
> > > > >>>>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>>>>>> MR
> > > > >>>>>>>>>>>>>>>>>>>>>> job
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> from
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the master to copy data or
> > > > >>> restore
> > > > >>>>>> data.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> (backup is also "not really
> > > > >> core"
> > > > >>>> as
> > > > >>>>>> in..
> > > > >>>>>>>> if
> > > > >>>>>>>>>> you
> > > > >>>>>>>>>>>>> don't
> > > > >>>>>>>>>>>>>>> use
> > > > >>>>>>>>>>>>>>>>>>>>> backup
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> you'll
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> not end up running MR jobs, but
> > > > >>>> this
> > > > >>>>>> was
> > > > >>>>>>>>>> probably
> > > > >>>>>>>>>>>>> true
> > > > >>>>>>>>>>>>>>> for
> > > > >>>>>>>>>>>>>>>>> MOB
> > > > >>>>>>>>>>>>>>>>>>>>> as
> > > > >>>>>>>>>>>>>>>>>>>>>> in
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> "if
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> you don't enable MOB you don't
> > > > >>> need
> > > > >>>>>> MR")
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> any thoughts? do we a rule that
> > > > >>>> says
> > > > >>>>>> "we
> > > > >>>>>>>>> don't
> > > > >>>>>>>>>>> want
> > > > >>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>> have
> > > > >>>>>>>>>>>>>>>>>>>>> hbase
> > > > >>>>>>>>>>>>>>>>>>>>>> run
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> MR
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs, only tool started
> > > > >> manually
> > > > >>> by
> > > > >>>>> the
> > > > >>>>>>>> user
> > > > >>>>>>>>>> can
> > > > >>>>>>>>>>> do
> > > > >>>>>>>>>>>>>>> that".
> > > > >>>>>>>>>>>>>>>> or
> > > > >>>>>>>>>>>>>>>>>>>>> can
> > > > >>>>>>>>>>>>>>>>>>>>>> we
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> start
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> adding MR calls around without
> > > > >>>>>> problems?
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message