hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
Date Sat, 24 Sep 2016 18:20:00 GMT
bq. don't call out to an external framework we don't own from master (or
regionserver) code

So the standalone service would run out of proc - in the same vein as REST
or thrift server.

Cheers

On Sat, Sep 24, 2016 at 10:40 AM, Andrew Purtell <andrew.purtell@gmail.com>
wrote:

> I was attempting to summarize Ted.
>
> A new maven module sounds like a good idea to me. Or we could move all the
> tools that use MR out to one. Or...
>
> The key takeaway seems to be don't call out to an external framework we
> don't own from master (or regionserver) code.
>
> > On Sep 24, 2016, at 10:15 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> > bq. Internally the tool can also use the procedure framework for state
> > durability
> >
> > Isn't this the standalone service I proposed this morning ?
> >
> > bq. Move cross HBase and MR coordination to a separate tool
> >
> > Where should this tool live (hbase-backup module) ?
> >
> > Thanks
> >
> >
> > On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell <
> andrew.purtell@gmail.com>
> > wrote:
> >
> >> At branch merge voting time now more eyes are getting on the design
> issues
> >> with dissenting opinion emerging. This is the branch merge process
> working
> >> as our community has designed it. Because this is the first full project
> >> review of the code and implementation I think we all have to be
> flexible. I
> >> see the community as trying to narrow the technical objection at issue
> to
> >> the smallest possible scope. It's simple: don't call out to an external
> >> execution framework we don't own from core master (and by extension
> >> regionserver) code. We had this objection before to a proposed external
> >> compaction implementation for
> >> MOB so should not come as a surprise. Please let me know if I have
> >> misstated this.
> >>
> >> This would seem to require a modest refactor of coordination to move
> >> invocation of MR code out from any core code path. To restate what I
> think
> >> is an emerging recommendation: Move cross HBase and MR coordination to a
> >> separate tool. This tool can ask the master to invoke procedures on the
> >> HBase side that do first mile export and last mile restore. (Internally
> the
> >> tool can also use the procedure framework for state durability, perhaps,
> >> just a thought.) Then the tool can further drive the things done with MR
> >> like shipping data off cluster or moving remote data in place and
> preparing
> >> it for import. These activities do not need procedure coordination and
> >> involvement of the HBase master. Only the first and last mile of the
> >> process needs atomicity within the HBase deploy. Please let me know if I
> >> have misstated this.
> >>
> >>
> >>> On Sep 24, 2016, at 8:17 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> >>>
> >>> bq. procedure gives you a retry mechanism on failure
> >>>
> >>> We do need this mechanism. Take a look at the multi-step
> >>> in FullTableBackupProcedure, etc.
> >>>
> >>> bq. let the user export it later when he wants
> >>>
> >>> This would make supporting security more complex (user A shouldn't be
> >>> exporting user B's backup). And it is not user friendly - at the time
> >>> backup request is issued, the following is specified:
> >>>
> >>> +          + " BACKUP_ROOT     The full root path to store the backup
> >>> image,\n"
> >>> +          + "                 the prefix can be hdfs, webhdfs or
> gpfs\n"
> >>>
> >>> Backup root is an integral part of backup manifest.
> >>>
> >>> Cheers
> >>>
> >>>
> >>> On Sat, Sep 24, 2016 at 7:59 AM, Matteo Bertozzi <
> >> theo.bertozzi@gmail.com>
> >>> wrote:
> >>>
> >>>>> On Sat, Sep 24, 2016 at 7:19 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> >>>>>
> >>>>> Ideally the export should have one job running which does the retry
> (on
> >>>>> failed partition) itself.
> >>>>>
> >>>>
> >>>> procedure gives you a retry mechanism on failure. if you don't use
> that,
> >>>> than you don't need procedure.
> >>>> if you want you can start a procedure executor in a non master process
> >> (the
> >>>> hbase-procedure is a separate package and does not depend on master).
> >> but
> >>>> again, export seems a case where you don't need procedure.
> >>>>
> >>>> like snapshot, the logic may just be: ask the master to take a backup.
> >> and
> >>>> let the user export it later when he wants. so you avoid having a MR
> job
> >>>> started by the master since people does not seems to like it.
> >>>>
> >>>> for restore (I think that is where you use the MR splitter) you can
> >>>> probably just have a backup ready (already splitted). there is
> already a
> >>>> jira that should do that HBASE-14135. instead of doing the operation
> of
> >>>> split/merge on restore. you consolidate the backup "offline" (mr job
> >>>> started by the user) and then ask to restore the backup.
> >>>>
> >>>>
> >>>>>
> >>>>> On Sat, Sep 24, 2016 at 7:04 AM, Matteo Bertozzi <
> >>>> theo.bertozzi@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> as far as I understand the code, you don't need procedure for the
> >>>> export
> >>>>>> itself.
> >>>>>> the export operation is already idempotent, since you are just
> copying
> >>>>>> files.
> >>>>>> if the file exist and is complete (check length, checksum, ...) you
> >> can
> >>>>>> skip it,
> >>>>>> otherwise you'll send it over again.
> >>>>>>
> >>>>>> you need the proc for taking the backup and restoring,
> >>>>>> because you want to complete the operation and end up with a
> >> consistent
> >>>>>> state
> >>>>>> across the multiple components you are updating (meta, fs, ...)
> >>>>>> but again, for export you can just run the tool over and over until
> >> the
> >>>>>> operation succeed, and that should be ok.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Matteo
> >>>>>>
> >>>>>>
> >>>>>>> On Sat, Sep 24, 2016 at 6:54 AM, Ted Yu <yuzhihong@gmail.com>
> wrote:
> >>>>>>>
> >>>>>>> Master is involved in this discussion because currently only Master
> >>>>>>> instantiates ProcedureExecutor which runs the 3 Procedures for
> >>>> backup /
> >>>>>>> restore.
> >>>>>>>
> >>>>>>> What if an optional standalone service which hosts
> ProcedureExecutor
> >>>> is
> >>>>>>> used for this purpose ?
> >>>>>>> Would that have better chance of giving us middle ground so that we
> >>>> can
> >>>>>>> move this forward ?
> >>>>>>>
> >>>>>>> Cheers
> >>>>>>>
> >>>>>>>> On Fri, Sep 23, 2016 at 5:15 PM, Stack <stack@duboce.net> wrote:
> >>>>>>>>
> >>>>>>>> (Moved out of the Master doing MR DISCUSSION)
> >>>>>>>>
> >>>>>>>> On Fri, Sep 23, 2016 at 12:24 PM, Vladimir Rodionov <
> >>>>>>>> vladrodionov@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>>>> -1 on that backup be in core hbase
> >>>>>>>>>
> >>>>>>>>> Not sure I understand what it means.
> >>>>>>>>>
> >>>>>>>>> Sorry for the imprecision.
> >>>>>>>>
> >>>>>>>> The -1 is NOT against backup/restore. I am -1 on MR as a
> dependency
> >>>>> and
> >>>>>>> so
> >>>>>>>> -1 on the Master running backup/restore MR jobs, even if optional.
> >>>>>>>>
> >>>>>>>> Master should not depend on MR. We've gone out of our way to avoid
> >>>>>> taking
> >>>>>>>> MR on as dependency in the past. Seems late in the game for us to
> >>>>>> change
> >>>>>>>> our opinion on this. If we didn't do it for distributed log
> >>>>> splitting,
> >>>>>> or
> >>>>>>>> MOB, why would we do it to support an optional backup/restore?
> >>>>>>>>
> >>>>>>>> I have opinions on the questions below -- i.e. that Master running
> >>>>>>>> backup/restore is outside of the Master's charge -- but they are
> >>>> not
> >>>>>>> worth
> >>>>>>>> much since I've not done much by way of review or contrib to
> >>>>>>> backup/restore
> >>>>>>>> other than to try it as a 'user' so I'll keep them to myself until
> >>>> I
> >>>>>> do.
> >>>>>>> I
> >>>>>>>> only came out from under my shell to participate on the MR as
> >>>>>> dependency
> >>>>>>>> chat.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> M
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> 1. We are not allowed to use Master to orchestrate the whole
> >>>> process?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> We
> >>>>>>>>> have already brought up all advantages of using
> >>>>>>>>>  Master and distributed procedures for backup and restore.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Downside of moving this to client tool is lack of fault
> >>>> tolerance:
> >>>>>>>>> 1.1 Client won't be allowed to do any operations, that can,
> >>>>>>> potentially
> >>>>>>>>> affect
> >>>>>>>>> cluster, such as disabling splits/merges, balancer.
> >>>>>>>>> 1.2 In case of client failure who will be doing the whole
> >>>> rollback
> >>>>>>>> stuff?
> >>>>>>>>> We are trying to make it atomic.
> >>>>>>>>>
> >>>>>>>>> Security is not clear.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> 2. We are not allowed to modify code of existing HBase core
> classes
> >>>>>> (what
> >>>>>>>>> does core mean anyway)?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> 3. We are not allowed to create backup system table
> >>>> (hbase:backup)
> >>>>>> in a
> >>>>>>>>> system space? Only in user space? The table is global.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> 2. is critical. Despite the fact, that 95% of code is new, we
> >>>> have
> >>>>>>>> touched,
> >>>>>>>>> of course some existing HBase code.
> >>>>>>>>> 3. is not that critical, of course we can move backup system into
> >>>>>> user
> >>>>>>>>> space.
> >>>>>>>>>
> >>>>>>>>> And finally, will moving backup into external tool give us +1
> >>>> from
> >>>>>>> stack?
> >>>>>>>>>
> >>>>>>>>> -Vlad
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Fri, Sep 23, 2016 at 11:26 AM, Stack <stack@duboce.net>
> >>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> On Fri, Sep 23, 2016 at 11:22 AM, Vladimir Rodionov <
> >>>>>>>>>> vladrodionov@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>>>> + MR is dead
> >>>>>>>>>>>
> >>>>>>>>>>> Does MR know that? :)
> >>>>>>>>>>>
> >>>>>>>>>>> Again. With all due respect, stack - still no suggestions
> >>>> what
> >>>>>>> should
> >>>>>>>>> we
> >>>>>>>>>>> use for "bulk data move and transformation" instead of MR?
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Use whatever distributed engine suits your fancy -- MR, Spark,
> >>>>>>>>> distributed
> >>>>>>>>>> shell -- just don't have HBase core depend on it, even
> >>>>> optionally.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> I suggest voting first on "do we need backup in HBase"? In my
> >>>>>>>> opinion,
> >>>>>>>>>> some
> >>>>>>>>>>> group members still not sure about that and some will give -1
> >>>>>>>>>>> in any case. Just because ...
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>> We could run a vote, sure. -1 on that backup be in core hbase
> >>>> (+1
> >>>>>> on
> >>>>>>>>> adding
> >>>>>>>>>> all the API any such external tool might need to run).
> >>>>>>>>>>
> >>>>>>>>>> St.Ack
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> -Vlad
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, Sep 23, 2016 at 10:57 AM, Stack <stack@duboce.net>
> >>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> On Fri, Sep 23, 2016 at 6:46 AM, Matteo Bertozzi <
> >>>>>>>>>>> theo.bertozzi@gmail.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> let me try to go back to my original topic.
> >>>>>>>>>>>>> this question was meant to be generic, and provide some
> >>>>> rule
> >>>>>>> for
> >>>>>>>>>> future
> >>>>>>>>>>>>> code.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> from what I can gather, a rule that may satisfy everyone
> >>>>> can
> >>>>>>> be:
> >>>>>>>>>>>>> - we don't want any core feature (e.g.
> >>>>>>> compaction/log-split/log-
> >>>>>>>>>>> reply)
> >>>>>>>>>>>>> over MR, because some cluster may not want or may have an
> >>>>>>>>>>>>> external/uncontrolled MR setup.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> +1
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> - we allow non-core features (e.g. features enabled by a
> >>>>>> flag)
> >>>>>>>> to
> >>>>>>>>>> run
> >>>>>>>>>>> MR
> >>>>>>>>>>>>> jobs from hbase, because unless you use the feature, MR
> >>>> is
> >>>>>> not
> >>>>>>>>>>> required.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>> -1 to hbase core depending on MR or core -- whether behind
> >>>> a
> >>>>>> flag
> >>>>>>>> or
> >>>>>>>>>> not
> >>>>>>>>>>> --
> >>>>>>>>>>>> ever being able to launch MR jobs.
> >>>>>>>>>>>>
> >>>>>>>>>>>> + MR is dead. We should be busy working hard to undo it
> >>>> from
> >>>>>>>>>> hbase-server
> >>>>>>>>>>>> moving it out to be an optional module (Spark would be its
> >>>>>> peer).
> >>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and Appy
> >>>>> are
> >>>>>>>> busy
> >>>>>>>>>>>> working hard on moving it up on to a new foundation. Lets
> >>>> not
> >>>>>>>> clutter
> >>>>>>>>>>> task
> >>>>>>>>>>>> harder by piling on more moving parts.
> >>>>>>>>>>>>
> >>>>>>>>>>>> St.Ack
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Matteo
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu <
> >>>>> yuzhihong@gmail.com
> >>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> I suggest you look at Matteo's work for
> >>>> AssignmentManager
> >>>>>>> which
> >>>>>>>>> is
> >>>>>>>>>> to
> >>>>>>>>>>>>> make
> >>>>>>>>>>>>>> Master more stable.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Cheers
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 <
> >>>>> palomino219@gmail.com
> >>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> No, not your fault, at lease, not this time:)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the
> >>>>>>> sequence
> >>>>>>>>> of
> >>>>>>>>>>>> calls
> >>>>>>>>>>>>>> when
> >>>>>>>>>>>>>>> starting up the HMaster? HMaster is also a
> >>>> regionserver
> >>>>>> so
> >>>>>>> it
> >>>>>>>>>>> extends
> >>>>>>>>>>>>>>> HRegionServer, and the initialization of
> >>>> HRegionServer
> >>>>>>>>> sometimes
> >>>>>>>>>>>> needs
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>> make rpc calls to HMaster. A simple change would
> >>>> cause
> >>>>>>>>>>> probabilistic
> >>>>>>>>>>>>> dead
> >>>>>>>>>>>>>>> lock or some strange NPEs...
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> That's why I'm very nervous when somebody wants to
> >>>> add
> >>>>>> new
> >>>>>>>>>> features
> >>>>>>>>>>>> or
> >>>>>>>>>>>>>> add
> >>>>>>>>>>>>>>> external dependencies to HMaster, especially add more
> >>>>>> works
> >>>>>>>> for
> >>>>>>>>>> the
> >>>>>>>>>>>>> start
> >>>>>>>>>>>>>>> up processing...
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 2016-09-23 20:02 GMT+08:00 Ted Yu <
> >>>> yuzhihong@gmail.com
> >>>>>> :
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I read through HADOOP-13433
> >>>>>>>>>>>>>>>> <https://issues.apache.org/
> >>>> jira/browse/HADOOP-13433>
> >>>>> -
> >>>>>>> the
> >>>>>>>>>> cited
> >>>>>>>>>>>>> race
> >>>>>>>>>>>>>>>> condition is in jdk.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Suggest pinging the reviewer on JIRA to get it
> >>>>> moving.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> bq. But the ugly code in HMaster is readlly a
> >>>>>> problem...
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Can you be specific as to which code is ugly ? Is
> >>>> it
> >>>>> in
> >>>>>>> the
> >>>>>>>>>>> backup
> >>>>>>>>>>>> /
> >>>>>>>>>>>>>>>> restore mega patch ?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Cheers
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 10:44 PM, 张铎 <
> >>>>>>>> palomino219@gmail.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> If you guys have already implemented the feature
> >>>> in
> >>>>>> the
> >>>>>>>> MR
> >>>>>>>>>> way
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> patch is ready for landing on master, I'm a -0 on
> >>>>> it
> >>>>>>> as I
> >>>>>>>>> do
> >>>>>>>>>>> not
> >>>>>>>>>>>>> want
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> block the development progress.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> But I strongly suggest later we need to revisit
> >>>> the
> >>>>>>>> design
> >>>>>>>>>> and
> >>>>>>>>>>>> see
> >>>>>>>>>>>>> if
> >>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>> can seperated the logic from HMaster as much as
> >>>>>>> possible.
> >>>>>>>>> HA
> >>>>>>>>>> is
> >>>>>>>>>>>>> not a
> >>>>>>>>>>>>>>> big
> >>>>>>>>>>>>>>>>> problem if you do not store any metada locally.
> >>>> But
> >>>>>> the
> >>>>>>>>> ugly
> >>>>>>>>>>> code
> >>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>> HMaster is readlly a problem...
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> And for security, I have a issue pending for a
> >>>> long
> >>>>>>> time.
> >>>>>>>>> Can
> >>>>>>>>>>>>> someone
> >>>>>>>>>>>>>>>> help
> >>>>>>>>>>>>>>>>> taking a simple look at it? This is what I mean,
> >>>>> ugly
> >>>>>>>>> code...
> >>>>>>>>>>>>> logout
> >>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> destroy the credentials in a subject when it is
> >>>>> still
> >>>>>>>> being
> >>>>>>>>>>> used,
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>> declared as LimitPrivacy so I can not change the
> >>>>>>> behivor
> >>>>>>>>> and
> >>>>>>>>>>> the
> >>>>>>>>>>>>> only
> >>>>>>>>>>>>>>> way
> >>>>>>>>>>>>>>>>> to fix it is to write another piece of ugly
> >>>> code...
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> https://issues.apache.org/
> >>>> jira/browse/HADOOP-13433
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov <
> >>>>>>>>>>>>> vladrodionov@gmail.com
> >>>>>>>>>>>>>>> :
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> If in the future, we find better ways of
> >>>> doing
> >>>>>>> this
> >>>>>>>>>>> without
> >>>>>>>>>>>>>> using
> >>>>>>>>>>>>>>>> MR,
> >>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>> can certainly consider that
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Our framework for distributed operations is
> >>>>>> abstract
> >>>>>>>> and
> >>>>>>>>>>> allows
> >>>>>>>>>>>>>>>>>> different implementations. MR is just one
> >>>>>>>> implementation
> >>>>>>>>> we
> >>>>>>>>>>>>>> provide.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> -Vlad
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das <
> >>>>>>>>>>>>> ddas@hortonworks.com
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Guys, first off apologies for bringing in the
> >>>>>> topic
> >>>>>>>> of
> >>>>>>>>>>>> MR-based
> >>>>>>>>>>>>>>>>>>> compactions.. But I was thinking more about
> >>>> the
> >>>>>>>>>>> SpliceMachine
> >>>>>>>>>>>>>>>> approach
> >>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>> managing compactions in Spark where
> >>>> apparently
> >>>>>> they
> >>>>>>>>> saw a
> >>>>>>>>>>> lot
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>> benefits.
> >>>>>>>>>>>>>>>>>>> Apologies for giving you that sore throat
> >>>>>> Andrew; I
> >>>>>>>>>> really
> >>>>>>>>>>>>> didn't
> >>>>>>>>>>>>>>>> mean
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> :-)
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> So on this issue, we have these on the plate:
> >>>>>>>>>>>>>>>>>>> 0. Somehow not use MR but something like that
> >>>>>>>>>>>>>>>>>>> 1. Run a standalone service other than master
> >>>>>>>>>>>>>>>>>>> 2. Shell out from the master
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I don't think we have a good answer to (0),
> >>>>> and I
> >>>>>>>> don't
> >>>>>>>>>>> think
> >>>>>>>>>>>>>> it's
> >>>>>>>>>>>>>>>> even
> >>>>>>>>>>>>>>>>>>> worth the effort of trying to build something
> >>>>>> when
> >>>>>>> MR
> >>>>>>>>> is
> >>>>>>>>>>>>> already
> >>>>>>>>>>>>>>>> there,
> >>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>> being used by HBase already for some
> >>>>> operations.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On (1), we have to deal with a myriad of
> >>>>> issues -
> >>>>>>> HA
> >>>>>>>> of
> >>>>>>>>>> the
> >>>>>>>>>>>>>> server
> >>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>> being the least of them all. Security
> >>>> (kerberos
> >>>>>>>>>>>> authentication,
> >>>>>>>>>>>>>>>> another
> >>>>>>>>>>>>>>>>>>> keytab to manage, etc. etc. etc.). IMO, that
> >>>>>>> approach
> >>>>>>>>> is
> >>>>>>>>>>> DOA.
> >>>>>>>>>>>>>>> Instead
> >>>>>>>>>>>>>>>>>> let's
> >>>>>>>>>>>>>>>>>>> substitute that (1) with the HBase Master. I
> >>>>>>> haven't
> >>>>>>>>> seen
> >>>>>>>>>>> any
> >>>>>>>>>>>>>> good
> >>>>>>>>>>>>>>>>> reason
> >>>>>>>>>>>>>>>>>>> why the HBase master shouldn't launch MR jobs
> >>>>> if
> >>>>>>>>> needed.
> >>>>>>>>>>> It's
> >>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>> ideal;
> >>>>>>>>>>>>>>>>>>> agreed.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Now before going to (2), let's see what are
> >>>> the
> >>>>>>>>> benefits
> >>>>>>>>>> of
> >>>>>>>>>>>>>> running
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> backup/restore jobs from the master. I think
> >>>>> Ted
> >>>>>>> has
> >>>>>>>>>>>> summarized
> >>>>>>>>>>>>>>> some
> >>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> issues that we need to take care of -
> >>>>> basically,
> >>>>>>> the
> >>>>>>>>>> master
> >>>>>>>>>>>> can
> >>>>>>>>>>>>>>> keep
> >>>>>>>>>>>>>>>>>> track
> >>>>>>>>>>>>>>>>>>> of running jobs, and should it fail, the
> >>>> backup
> >>>>>>>> master
> >>>>>>>>>> can
> >>>>>>>>>>>>>> continue
> >>>>>>>>>>>>>>>>>> keeping
> >>>>>>>>>>>>>>>>>>> track of it (since the jobId would have been
> >>>>>>> recorded
> >>>>>>>>> in
> >>>>>>>>>>> the
> >>>>>>>>>>>>> proc
> >>>>>>>>>>>>>>>> WAL).
> >>>>>>>>>>>>>>>>>> The
> >>>>>>>>>>>>>>>>>>> master can also do cleanup, etc. of failed
> >>>>>>>>> backup/restore
> >>>>>>>>>>>>>>> processes.
> >>>>>>>>>>>>>>>>>>> Security is another issue - the job needs to
> >>>>> run
> >>>>>> as
> >>>>>>>>>> 'hbase'
> >>>>>>>>>>>>> since
> >>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>> owns
> >>>>>>>>>>>>>>>>>>> the data. Having the master launch the job
> >>>>> makes
> >>>>>> it
> >>>>>>>> get
> >>>>>>>>>>> that
> >>>>>>>>>>>>>>>> privilege.
> >>>>>>>>>>>>>>>>>> In
> >>>>>>>>>>>>>>>>>>> the (2) approach, it's hard to do some of the
> >>>>>> above
> >>>>>>>>>>>> management.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Guys, just to reiterate, the patch as such is
> >>>>>> ready
> >>>>>>>>> from
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>> overall
> >>>>>>>>>>>>>>>>>>> design/arch point of view (maybe code review
> >>>> is
> >>>>>>> still
> >>>>>>>>>>> pending
> >>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>>> Matteo).
> >>>>>>>>>>>>>>>>>>> If in the future, we find better ways of
> >>>> doing
> >>>>>> this
> >>>>>>>>>> without
> >>>>>>>>>>>>> using
> >>>>>>>>>>>>>>> MR,
> >>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>> can certainly consider that. But IMO don't
> >>>>> think
> >>>>>> we
> >>>>>>>>>> should
> >>>>>>>>>>>>> block
> >>>>>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>> patch
> >>>>>>>>>>>>>>>>>>> from getting merged.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> ________________________________________
> >>>>>>>>>>>>>>>>>>> From: 张铎 <palomino219@gmail.com>
> >>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22, 2016 8:32 PM
> >>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org
> >>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs started by
> >>>>>> Master
> >>>>>>>> or
> >>>>>>>>> RS
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> So what about a standalone service other than
> >>>>>>> master?
> >>>>>>>>> You
> >>>>>>>>>>> can
> >>>>>>>>>>>>> use
> >>>>>>>>>>>>>>>> your
> >>>>>>>>>>>>>>>>>> own
> >>>>>>>>>>>>>>>>>>> procedure store in that service?
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> 2016-09-23 11:28 GMT+08:00 Ted Yu <
> >>>>>>>> yuzhihong@gmail.com
> >>>>>>>>>> :
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> An earlier implementation was client
> >>>> driven.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> But with that approach, it is hard to
> >>>> resume
> >>>>> if
> >>>>>>>> there
> >>>>>>>>>> is
> >>>>>>>>>>>>> error
> >>>>>>>>>>>>>>>>> midway.
> >>>>>>>>>>>>>>>>>>>> Using Procedure V2 makes the backup /
> >>>> restore
> >>>>>>> more
> >>>>>>>>>>> robust.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Another consideration is for security. It
> >>>> is
> >>>>>> hard
> >>>>>>>> to
> >>>>>>>>>>>> enforce
> >>>>>>>>>>>>>>>> security
> >>>>>>>>>>>>>>>>>> (to
> >>>>>>>>>>>>>>>>>>>> be implemented) for client driven actions.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Cheers
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 8:15 PM, Andrew
> >>>>> Purtell <
> >>>>>>>>>>>>>>>>>> andrew.purtell@gmail.com>
> >>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> No, this misses Matteo's finer point,
> >>>> which
> >>>>>> is
> >>>>>>>>>>> "shelling
> >>>>>>>>>>>>> out"
> >>>>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>> master directly to run MR is a first. Why
> >>>> not
> >>>>>>> drive
> >>>>>>>>>> this
> >>>>>>>>>>>>> with a
> >>>>>>>>>>>>>>>>> utility
> >>>>>>>>>>>>>>>>>>>> derived from Tool?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 7:57 PM, Vladimir
> >>>>>> Rodionov
> >>>>>>> <
> >>>>>>>>>>>>>>>>>> vladrodionov@gmail.com
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> In our production cluster,  it is a
> >>>>> common
> >>>>>>>> case
> >>>>>>>>> we
> >>>>>>>>>>>> just
> >>>>>>>>>>>>>> have
> >>>>>>>>>>>>>>>>> HDFS
> >>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>> HBase deployed.
> >>>>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR
> >>>> framework
> >>>>>>>>>> (especially
> >>>>>>>>>>>> some
> >>>>>>>>>>>>>>>>> features
> >>>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>>>>> have not used at all),  it introduced
> >>>>>>> another
> >>>>>>>>> cost
> >>>>>>>>>>> for
> >>>>>>>>>>>>>>>> maintain.
> >>>>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> So , you are not backup users in this
> >>>>> case.
> >>>>>>> Many
> >>>>>>>>> our
> >>>>>>>>>>>>>> customers
> >>>>>>>>>>>>>>>>> have
> >>>>>>>>>>>>>>>>>>> full
> >>>>>>>>>>>>>>>>>>>>>> stack deployed and
> >>>>>>>>>>>>>>>>>>>>>> want see backup to be a standard
> >>>> feature.
> >>>>>>>> Besides
> >>>>>>>>>>> this,
> >>>>>>>>>>>>>>> nothing
> >>>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>>>> happen
> >>>>>>>>>>>>>>>>>>>>>> in your cluster
> >>>>>>>>>>>>>>>>>>>>>> if you won't be doing backups.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> This discussion (we do not want see M/R
> >>>>>>>>> dependency)
> >>>>>>>>>>> goes
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> nowhere.
> >>>>>>>>>>>>>>>>>>> We
> >>>>>>>>>>>>>>>>>>>>>> asked already, at least twice, to
> >>>> suggest
> >>>>>>>> another
> >>>>>>>>>>>>> framework
> >>>>>>>>>>>>>>>> (other
> >>>>>>>>>>>>>>>>>>> than
> >>>>>>>>>>>>>>>>>>>> M/R)
> >>>>>>>>>>>>>>>>>>>>>> for bulk data copy with *conversion*.
> >>>>> Still
> >>>>>>>>> waiting
> >>>>>>>>>>> for
> >>>>>>>>>>>>>>>>> suggestions.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> -Vlad
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:49 PM, Ted
> >>>> Yu <
> >>>>>>>>>>>>>> yuzhihong@gmail.com
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> If MR framework is not deployed in the
> >>>>>>> cluster,
> >>>>>>>>>> hbase
> >>>>>>>>>>>>> still
> >>>>>>>>>>>>>>>>>> functions
> >>>>>>>>>>>>>>>>>>>>>>> normally (post merge).
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> In terms of build time dependency, we
> >>>>> have
> >>>>>>> long
> >>>>>>>>>> been
> >>>>>>>>>>>>>>> depending
> >>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>>>> mapreduce. Take a look at
> >>>> ExportSnapshot.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Cheers
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:42 PM, Heng
> >>>>> Chen
> >>>>>> <
> >>>>>>>>>>>>>>>>>> heng.chen.1986@gmail.com
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> In our production cluster,  it is a
> >>>>> common
> >>>>>>>> case
> >>>>>>>>> we
> >>>>>>>>>>>> just
> >>>>>>>>>>>>>> have
> >>>>>>>>>>>>>>>>> HDFS
> >>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>> HBase deployed.
> >>>>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR
> >>>> framework
> >>>>>>>>>> (especially
> >>>>>>>>>>>> some
> >>>>>>>>>>>>>>>>> features
> >>>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>>>>> have not used at all),  it introduced
> >>>>>>> another
> >>>>>>>>> cost
> >>>>>>>>>>> for
> >>>>>>>>>>>>>>>> maintain.
> >>>>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:28 GMT+08:00 张铎 <
> >>>>>>>>>>> palomino219@gmail.com
> >>>>>>>>>>>>> :
> >>>>>>>>>>>>>>>>>>>>>>>>> To be specific, for example, our nice
> >>>>>>>>>>> Backup/Restore
> >>>>>>>>>>>>>>> feature,
> >>>>>>>>>>>>>>>>> if
> >>>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>>>> think
> >>>>>>>>>>>>>>>>>>>>>>>>> this is not a core feature of HBase,
> >>>>> then
> >>>>>>> we
> >>>>>>>>>> could
> >>>>>>>>>>>> make
> >>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>> depend
> >>>>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>>>>> MR,
> >>>>>>>>>>>>>>>>>>>>>>>>> and start a standalone BackupManager
> >>>>>>> instance
> >>>>>>>>>> that
> >>>>>>>>>>>>>> submits
> >>>>>>>>>>>>>>> MR
> >>>>>>>>>>>>>>>>>> jobs
> >>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>> do
> >>>>>>>>>>>>>>>>>>>>>>>>> periodical maintenance job. And if we
> >>>>>> think
> >>>>>>>>> this
> >>>>>>>>>>> is a
> >>>>>>>>>>>>>> core
> >>>>>>>>>>>>>>>>>> feature
> >>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>>>>> everyone should use it, then we'd
> >>>>> better
> >>>>>>>>>> implement
> >>>>>>>>>>> it
> >>>>>>>>>>>>>>> without
> >>>>>>>>>>>>>>>>> MR
> >>>>>>>>>>>>>>>>>>>>>>>>> dependency, like DLS.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:11 GMT+08:00 张铎 <
> >>>>>>>>>>> palomino219@gmail.com
> >>>>>>>>>>>>> :
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> I‘m -1 on let master or rs launch MR
> >>>>>> jobs.
> >>>>>>>> It
> >>>>>>>>> is
> >>>>>>>>>>> OK
> >>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>> some
> >>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>> our
> >>>>>>>>>>>>>>>>>>>>>>>>>> features depend on MR but I think
> >>>> the
> >>>>>>> bottom
> >>>>>>>>>> line
> >>>>>>>>>>> is
> >>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>> should
> >>>>>>>>>>>>>>>>>>>>>>>> launch
> >>>>>>>>>>>>>>>>>>>>>>>>>> the jobs from outside manually or by
> >>>>>> other
> >>>>>>>>>>> services.
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 9:47 GMT+08:00 Andrew
> >>>>>> Purtell <
> >>>>>>>>>>>>>>>>>>> andrew.purtell@gmail.com
> >>>>>>>>>>>>>>>>>>>>> :
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Ok, got it. Well "shelling out" is
> >>>> on
> >>>>>> the
> >>>>>>>>> line
> >>>>>>>>>> I
> >>>>>>>>>>>>> think,
> >>>>>>>>>>>>>>> so
> >>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>> fair
> >>>>>>>>>>>>>>>>>>>>>>>>>>> question.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Can this be driven by a utility
> >>>>> derived
> >>>>>>>> from
> >>>>>>>>>> Tool
> >>>>>>>>>>>>> like
> >>>>>>>>>>>>>>> our
> >>>>>>>>>>>>>>>>>> other
> >>>>>>>>>>>>>>>>>>> MR
> >>>>>>>>>>>>>>>>>>>>>>>> apps?
> >>>>>>>>>>>>>>>>>>>>>>>>>>> The issue is needing the
> >>>>>> AccessController
> >>>>>>>> to
> >>>>>>>>>>> decide
> >>>>>>>>>>>>> if
> >>>>>>>>>>>>>>>>> allowed?
> >>>>>>>>>>>>>>>>>>> But
> >>>>>>>>>>>>>>>>>>>>>>>> nothing
> >>>>>>>>>>>>>>>>>>>>>>>>>>> prevents the user from running the
> >>>>> job
> >>>>>>>>>>>>>>>>> manually/independently,
> >>>>>>>>>>>>>>>>>>>> right?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 3:44 PM,
> >>>> Matteo
> >>>>>>>>> Bertozzi <
> >>>>>>>>>>>>>>>>>>>>>>>> theo.bertozzi@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> just a remark. my query was not
> >>>>> about
> >>>>>>>> tools
> >>>>>>>>>>> using
> >>>>>>>>>>>> MR
> >>>>>>>>>>>>>>>>>> (everyone i
> >>>>>>>>>>>>>>>>>>>>>>>> think
> >>>>>>>>>>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> ok with those).
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> the topic was about: "are we ok
> >>>> with
> >>>>>>>> running
> >>>>>>>>>> MR
> >>>>>>>>>>>> jobs
> >>>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>>> Master
> >>>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>>>> RSs
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> code?" since this will be the
> >>>> first
> >>>>>> time
> >>>>>>>> we
> >>>>>>>>> do
> >>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Matteo
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM,
> >>>>>>> Devaraj
> >>>>>>>>> Das
> >>>>>>>>>> <
> >>>>>>>>>>>>>>>>>>>>>>> ddas@hortonworks.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Very much agree; for tools like
> >>>>>>>>>> ExportSnapshot
> >>>>>>>>>>> /
> >>>>>>>>>>>>>>> Backup /
> >>>>>>>>>>>>>>>>>>>> Restore,
> >>>>>>>>>>>>>>>>>>>>>>>> it's
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> fine to be dependent on MR. MR is
> >>>>> the
> >>>>>>>> right
> >>>>>>>>>>>>> framework
> >>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>> such.
> >>>>>>>>>>>>>>>>>>>> We
> >>>>>>>>>>>>>>>>>>>>>>>>>>> should
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> also do compactions using MR
> >>>> (just
> >>>>>>> saying
> >>>>>>>>> :)
> >>>>>>>>>> )
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ______________________________
> >>>>>>> __________
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> From: Ted Yu <
> >>>> yuzhihong@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22,
> >>>> 2016
> >>>>>> 2:00
> >>>>>>>> PM
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs
> >>>>>>> started
> >>>>>>>>> by
> >>>>>>>>>>>> Master
> >>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>> RS
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree - backup / restore is in
> >>>>> the
> >>>>>>> same
> >>>>>>>>>>>> category
> >>>>>>>>>>>>> as
> >>>>>>>>>>>>>>>>> import
> >>>>>>>>>>>>>>>>>> /
> >>>>>>>>>>>>>>>>>>>>>>>> export.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM,
> >>>>>> Andrew
> >>>>>>>>>>> Purtell <
> >>>>>>>>>>>>>>>>>>>>>>>>>>> andrew.purtell@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Backup is extra tooling around
> >>>>> core
> >>>>>> in
> >>>>>>>> my
> >>>>>>>>>>>> opinion.
> >>>>>>>>>>>>>>> Like
> >>>>>>>>>>>>>>>>>> import
> >>>>>>>>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>>>>>>>>>>>> export.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Or the optional MOB tool. It's
> >>>>> fine.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 1:50 PM,
> >>>>> Matteo
> >>>>>>>>>> Bertozzi
> >>>>>>>>>>> <
> >>>>>>>>>>>>>>>>>>>>>>>> mbertozzi@apache.org>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What's the latest opinion
> >>>> around
> >>>>>>>> running
> >>>>>>>>> MR
> >>>>>>>>>>>> jobs
> >>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>> hbase
> >>>>>>>>>>>>>>>>>>>>>>>> (Master
> >>>>>>>>>>>>>>>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> RS)?
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I remember in the past that
> >>>> there
> >>>>>> was
> >>>>>>>>>>>> discussion
> >>>>>>>>>>>>>>> about
> >>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>>>>>> having
> >>>>>>>>>>>>>>>>>>>>>>>> MR
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> has
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> direct dependency of hbase.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think some of discussion
> >>>> where
> >>>>>>> around
> >>>>>>>>> MOB
> >>>>>>>>>>>> that
> >>>>>>>>>>>>>> had
> >>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>> MR
> >>>>>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> compact,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that later was transformed in a
> >>>>>>> non-MR
> >>>>>>>>> job
> >>>>>>>>>> to
> >>>>>>>>>>>> be
> >>>>>>>>>>>>>>>> merged,
> >>>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>>> think
> >>>>>>>>>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> had a
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> similar discussion for log
> >>>>>>>> split/replay.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the latest is the new Backup
> >>>>>> feature
> >>>>>>>>>>>>> (HBASE-7912),
> >>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>> runs
> >>>>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>>> MR
> >>>>>>>>>>>>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the master to copy data or
> >>>>> restore
> >>>>>>>> data.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (backup is also "not really
> >>>> core"
> >>>>>> as
> >>>>>>>> in..
> >>>>>>>>>> if
> >>>>>>>>>>>> you
> >>>>>>>>>>>>>>> don't
> >>>>>>>>>>>>>>>>> use
> >>>>>>>>>>>>>>>>>>>>>>> backup
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> you'll
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not end up running MR jobs, but
> >>>>>> this
> >>>>>>>> was
> >>>>>>>>>>>> probably
> >>>>>>>>>>>>>>> true
> >>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>> MOB
> >>>>>>>>>>>>>>>>>>>>>>> as
> >>>>>>>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> "if
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you don't enable MOB you don't
> >>>>> need
> >>>>>>>> MR")
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> any thoughts? do we a rule that
> >>>>>> says
> >>>>>>>> "we
> >>>>>>>>>>> don't
> >>>>>>>>>>>>> want
> >>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> have
> >>>>>>>>>>>>>>>>>>>>>>> hbase
> >>>>>>>>>>>>>>>>>>>>>>>> run
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> MR
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs, only tool started
> >>>> manually
> >>>>> by
> >>>>>>> the
> >>>>>>>>>> user
> >>>>>>>>>>>> can
> >>>>>>>>>>>>> do
> >>>>>>>>>>>>>>>>> that".
> >>>>>>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> start
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> adding MR calls around without
> >>>>>>>> problems?
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message