hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Rodionov <vladrodio...@gmail.com>
Subject Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
Date Sat, 24 Sep 2016 18:26:23 GMT
>> So the standalone service would run out of proc - in the same vein as
REST
or thrift server.

Ted, running separate process/service to coordinate backups is not a good
idea. We have already a lot of them.

On Sat, Sep 24, 2016 at 11:20 AM, Ted Yu <yuzhihong@gmail.com> wrote:

> bq. don't call out to an external framework we don't own from master (or
> regionserver) code
>
> So the standalone service would run out of proc - in the same vein as REST
> or thrift server.
>
> Cheers
>
> On Sat, Sep 24, 2016 at 10:40 AM, Andrew Purtell <andrew.purtell@gmail.com
> >
> wrote:
>
> > I was attempting to summarize Ted.
> >
> > A new maven module sounds like a good idea to me. Or we could move all
> the
> > tools that use MR out to one. Or...
> >
> > The key takeaway seems to be don't call out to an external framework we
> > don't own from master (or regionserver) code.
> >
> > > On Sep 24, 2016, at 10:15 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> > >
> > > bq. Internally the tool can also use the procedure framework for state
> > > durability
> > >
> > > Isn't this the standalone service I proposed this morning ?
> > >
> > > bq. Move cross HBase and MR coordination to a separate tool
> > >
> > > Where should this tool live (hbase-backup module) ?
> > >
> > > Thanks
> > >
> > >
> > > On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell <
> > andrew.purtell@gmail.com>
> > > wrote:
> > >
> > >> At branch merge voting time now more eyes are getting on the design
> > issues
> > >> with dissenting opinion emerging. This is the branch merge process
> > working
> > >> as our community has designed it. Because this is the first full
> project
> > >> review of the code and implementation I think we all have to be
> > flexible. I
> > >> see the community as trying to narrow the technical objection at issue
> > to
> > >> the smallest possible scope. It's simple: don't call out to an
> external
> > >> execution framework we don't own from core master (and by extension
> > >> regionserver) code. We had this objection before to a proposed
> external
> > >> compaction implementation for
> > >> MOB so should not come as a surprise. Please let me know if I have
> > >> misstated this.
> > >>
> > >> This would seem to require a modest refactor of coordination to move
> > >> invocation of MR code out from any core code path. To restate what I
> > think
> > >> is an emerging recommendation: Move cross HBase and MR coordination
> to a
> > >> separate tool. This tool can ask the master to invoke procedures on
> the
> > >> HBase side that do first mile export and last mile restore.
> (Internally
> > the
> > >> tool can also use the procedure framework for state durability,
> perhaps,
> > >> just a thought.) Then the tool can further drive the things done with
> MR
> > >> like shipping data off cluster or moving remote data in place and
> > preparing
> > >> it for import. These activities do not need procedure coordination and
> > >> involvement of the HBase master. Only the first and last mile of the
> > >> process needs atomicity within the HBase deploy. Please let me know
> if I
> > >> have misstated this.
> > >>
> > >>
> > >>> On Sep 24, 2016, at 8:17 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> > >>>
> > >>> bq. procedure gives you a retry mechanism on failure
> > >>>
> > >>> We do need this mechanism. Take a look at the multi-step
> > >>> in FullTableBackupProcedure, etc.
> > >>>
> > >>> bq. let the user export it later when he wants
> > >>>
> > >>> This would make supporting security more complex (user A shouldn't be
> > >>> exporting user B's backup). And it is not user friendly - at the time
> > >>> backup request is issued, the following is specified:
> > >>>
> > >>> +          + " BACKUP_ROOT     The full root path to store the backup
> > >>> image,\n"
> > >>> +          + "                 the prefix can be hdfs, webhdfs or
> > gpfs\n"
> > >>>
> > >>> Backup root is an integral part of backup manifest.
> > >>>
> > >>> Cheers
> > >>>
> > >>>
> > >>> On Sat, Sep 24, 2016 at 7:59 AM, Matteo Bertozzi <
> > >> theo.bertozzi@gmail.com>
> > >>> wrote:
> > >>>
> > >>>>> On Sat, Sep 24, 2016 at 7:19 AM, Ted Yu <yuzhihong@gmail.com>
> wrote:
> > >>>>>
> > >>>>> Ideally the export should have one job running which does the retry
> > (on
> > >>>>> failed partition) itself.
> > >>>>>
> > >>>>
> > >>>> procedure gives you a retry mechanism on failure. if you don't use
> > that,
> > >>>> than you don't need procedure.
> > >>>> if you want you can start a procedure executor in a non master
> process
> > >> (the
> > >>>> hbase-procedure is a separate package and does not depend on
> master).
> > >> but
> > >>>> again, export seems a case where you don't need procedure.
> > >>>>
> > >>>> like snapshot, the logic may just be: ask the master to take a
> backup.
> > >> and
> > >>>> let the user export it later when he wants. so you avoid having a MR
> > job
> > >>>> started by the master since people does not seems to like it.
> > >>>>
> > >>>> for restore (I think that is where you use the MR splitter) you can
> > >>>> probably just have a backup ready (already splitted). there is
> > already a
> > >>>> jira that should do that HBASE-14135. instead of doing the operation
> > of
> > >>>> split/merge on restore. you consolidate the backup "offline" (mr job
> > >>>> started by the user) and then ask to restore the backup.
> > >>>>
> > >>>>
> > >>>>>
> > >>>>> On Sat, Sep 24, 2016 at 7:04 AM, Matteo Bertozzi <
> > >>>> theo.bertozzi@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> as far as I understand the code, you don't need procedure for the
> > >>>> export
> > >>>>>> itself.
> > >>>>>> the export operation is already idempotent, since you are just
> > copying
> > >>>>>> files.
> > >>>>>> if the file exist and is complete (check length, checksum, ...)
> you
> > >> can
> > >>>>>> skip it,
> > >>>>>> otherwise you'll send it over again.
> > >>>>>>
> > >>>>>> you need the proc for taking the backup and restoring,
> > >>>>>> because you want to complete the operation and end up with a
> > >> consistent
> > >>>>>> state
> > >>>>>> across the multiple components you are updating (meta, fs, ...)
> > >>>>>> but again, for export you can just run the tool over and over
> until
> > >> the
> > >>>>>> operation succeed, and that should be ok.
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> Matteo
> > >>>>>>
> > >>>>>>
> > >>>>>>> On Sat, Sep 24, 2016 at 6:54 AM, Ted Yu <yuzhihong@gmail.com>
> > wrote:
> > >>>>>>>
> > >>>>>>> Master is involved in this discussion because currently only
> Master
> > >>>>>>> instantiates ProcedureExecutor which runs the 3 Procedures for
> > >>>> backup /
> > >>>>>>> restore.
> > >>>>>>>
> > >>>>>>> What if an optional standalone service which hosts
> > ProcedureExecutor
> > >>>> is
> > >>>>>>> used for this purpose ?
> > >>>>>>> Would that have better chance of giving us middle ground so that
> we
> > >>>> can
> > >>>>>>> move this forward ?
> > >>>>>>>
> > >>>>>>> Cheers
> > >>>>>>>
> > >>>>>>>> On Fri, Sep 23, 2016 at 5:15 PM, Stack <stack@duboce.net>
> wrote:
> > >>>>>>>>
> > >>>>>>>> (Moved out of the Master doing MR DISCUSSION)
> > >>>>>>>>
> > >>>>>>>> On Fri, Sep 23, 2016 at 12:24 PM, Vladimir Rodionov <
> > >>>>>>>> vladrodionov@gmail.com>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>>>> -1 on that backup be in core hbase
> > >>>>>>>>>
> > >>>>>>>>> Not sure I understand what it means.
> > >>>>>>>>>
> > >>>>>>>>> Sorry for the imprecision.
> > >>>>>>>>
> > >>>>>>>> The -1 is NOT against backup/restore. I am -1 on MR as a
> > dependency
> > >>>>> and
> > >>>>>>> so
> > >>>>>>>> -1 on the Master running backup/restore MR jobs, even if
> optional.
> > >>>>>>>>
> > >>>>>>>> Master should not depend on MR. We've gone out of our way to
> avoid
> > >>>>>> taking
> > >>>>>>>> MR on as dependency in the past. Seems late in the game for us
> to
> > >>>>>> change
> > >>>>>>>> our opinion on this. If we didn't do it for distributed log
> > >>>>> splitting,
> > >>>>>> or
> > >>>>>>>> MOB, why would we do it to support an optional backup/restore?
> > >>>>>>>>
> > >>>>>>>> I have opinions on the questions below -- i.e. that Master
> running
> > >>>>>>>> backup/restore is outside of the Master's charge -- but they are
> > >>>> not
> > >>>>>>> worth
> > >>>>>>>> much since I've not done much by way of review or contrib to
> > >>>>>>> backup/restore
> > >>>>>>>> other than to try it as a 'user' so I'll keep them to myself
> until
> > >>>> I
> > >>>>>> do.
> > >>>>>>> I
> > >>>>>>>> only came out from under my shell to participate on the MR as
> > >>>>>> dependency
> > >>>>>>>> chat.
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>> M
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> 1. We are not allowed to use Master to orchestrate the whole
> > >>>> process?
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> We
> > >>>>>>>>> have already brought up all advantages of using
> > >>>>>>>>>  Master and distributed procedures for backup and restore.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Downside of moving this to client tool is lack of fault
> > >>>> tolerance:
> > >>>>>>>>> 1.1 Client won't be allowed to do any operations, that can,
> > >>>>>>> potentially
> > >>>>>>>>> affect
> > >>>>>>>>> cluster, such as disabling splits/merges, balancer.
> > >>>>>>>>> 1.2 In case of client failure who will be doing the whole
> > >>>> rollback
> > >>>>>>>> stuff?
> > >>>>>>>>> We are trying to make it atomic.
> > >>>>>>>>>
> > >>>>>>>>> Security is not clear.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> 2. We are not allowed to modify code of existing HBase core
> > classes
> > >>>>>> (what
> > >>>>>>>>> does core mean anyway)?
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>> 3. We are not allowed to create backup system table
> > >>>> (hbase:backup)
> > >>>>>> in a
> > >>>>>>>>> system space? Only in user space? The table is global.
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>> 2. is critical. Despite the fact, that 95% of code is new, we
> > >>>> have
> > >>>>>>>> touched,
> > >>>>>>>>> of course some existing HBase code.
> > >>>>>>>>> 3. is not that critical, of course we can move backup system
> into
> > >>>>>> user
> > >>>>>>>>> space.
> > >>>>>>>>>
> > >>>>>>>>> And finally, will moving backup into external tool give us +1
> > >>>> from
> > >>>>>>> stack?
> > >>>>>>>>>
> > >>>>>>>>> -Vlad
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Fri, Sep 23, 2016 at 11:26 AM, Stack <stack@duboce.net>
> > >>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> On Fri, Sep 23, 2016 at 11:22 AM, Vladimir Rodionov <
> > >>>>>>>>>> vladrodionov@gmail.com>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>>>> + MR is dead
> > >>>>>>>>>>>
> > >>>>>>>>>>> Does MR know that? :)
> > >>>>>>>>>>>
> > >>>>>>>>>>> Again. With all due respect, stack - still no suggestions
> > >>>> what
> > >>>>>>> should
> > >>>>>>>>> we
> > >>>>>>>>>>> use for "bulk data move and transformation" instead of MR?
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> Use whatever distributed engine suits your fancy -- MR, Spark,
> > >>>>>>>>> distributed
> > >>>>>>>>>> shell -- just don't have HBase core depend on it, even
> > >>>>> optionally.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>> I suggest voting first on "do we need backup in HBase"? In my
> > >>>>>>>> opinion,
> > >>>>>>>>>> some
> > >>>>>>>>>>> group members still not sure about that and some will give -1
> > >>>>>>>>>>> in any case. Just because ...
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>> We could run a vote, sure. -1 on that backup be in core hbase
> > >>>> (+1
> > >>>>>> on
> > >>>>>>>>> adding
> > >>>>>>>>>> all the API any such external tool might need to run).
> > >>>>>>>>>>
> > >>>>>>>>>> St.Ack
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>> -Vlad
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Fri, Sep 23, 2016 at 10:57 AM, Stack <stack@duboce.net>
> > >>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> On Fri, Sep 23, 2016 at 6:46 AM, Matteo Bertozzi <
> > >>>>>>>>>>> theo.bertozzi@gmail.com>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> let me try to go back to my original topic.
> > >>>>>>>>>>>>> this question was meant to be generic, and provide some
> > >>>>> rule
> > >>>>>>> for
> > >>>>>>>>>> future
> > >>>>>>>>>>>>> code.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> from what I can gather, a rule that may satisfy everyone
> > >>>>> can
> > >>>>>>> be:
> > >>>>>>>>>>>>> - we don't want any core feature (e.g.
> > >>>>>>> compaction/log-split/log-
> > >>>>>>>>>>> reply)
> > >>>>>>>>>>>>> over MR, because some cluster may not want or may have an
> > >>>>>>>>>>>>> external/uncontrolled MR setup.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> +1
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> - we allow non-core features (e.g. features enabled by a
> > >>>>>> flag)
> > >>>>>>>> to
> > >>>>>>>>>> run
> > >>>>>>>>>>> MR
> > >>>>>>>>>>>>> jobs from hbase, because unless you use the feature, MR
> > >>>> is
> > >>>>>> not
> > >>>>>>>>>>> required.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>> -1 to hbase core depending on MR or core -- whether behind
> > >>>> a
> > >>>>>> flag
> > >>>>>>>> or
> > >>>>>>>>>> not
> > >>>>>>>>>>> --
> > >>>>>>>>>>>> ever being able to launch MR jobs.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> + MR is dead. We should be busy working hard to undo it
> > >>>> from
> > >>>>>>>>>> hbase-server
> > >>>>>>>>>>>> moving it out to be an optional module (Spark would be its
> > >>>>>> peer).
> > >>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and Appy
> > >>>>> are
> > >>>>>>>> busy
> > >>>>>>>>>>>> working hard on moving it up on to a new foundation. Lets
> > >>>> not
> > >>>>>>>> clutter
> > >>>>>>>>>>> task
> > >>>>>>>>>>>> harder by piling on more moving parts.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> St.Ack
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Matteo
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu <
> > >>>>> yuzhihong@gmail.com
> > >>>>>>>
> > >>>>>>>>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I suggest you look at Matteo's work for
> > >>>> AssignmentManager
> > >>>>>>> which
> > >>>>>>>>> is
> > >>>>>>>>>> to
> > >>>>>>>>>>>>> make
> > >>>>>>>>>>>>>> Master more stable.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Cheers
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 <
> > >>>>> palomino219@gmail.com
> > >>>>>>>
> > >>>>>>>>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> No, not your fault, at lease, not this time:)
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the
> > >>>>>>> sequence
> > >>>>>>>>> of
> > >>>>>>>>>>>> calls
> > >>>>>>>>>>>>>> when
> > >>>>>>>>>>>>>>> starting up the HMaster? HMaster is also a
> > >>>> regionserver
> > >>>>>> so
> > >>>>>>> it
> > >>>>>>>>>>> extends
> > >>>>>>>>>>>>>>> HRegionServer, and the initialization of
> > >>>> HRegionServer
> > >>>>>>>>> sometimes
> > >>>>>>>>>>>> needs
> > >>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>> make rpc calls to HMaster. A simple change would
> > >>>> cause
> > >>>>>>>>>>> probabilistic
> > >>>>>>>>>>>>> dead
> > >>>>>>>>>>>>>>> lock or some strange NPEs...
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> That's why I'm very nervous when somebody wants to
> > >>>> add
> > >>>>>> new
> > >>>>>>>>>> features
> > >>>>>>>>>>>> or
> > >>>>>>>>>>>>>> add
> > >>>>>>>>>>>>>>> external dependencies to HMaster, especially add more
> > >>>>>> works
> > >>>>>>>> for
> > >>>>>>>>>> the
> > >>>>>>>>>>>>> start
> > >>>>>>>>>>>>>>> up processing...
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Thanks.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> 2016-09-23 20:02 GMT+08:00 Ted Yu <
> > >>>> yuzhihong@gmail.com
> > >>>>>> :
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> I read through HADOOP-13433
> > >>>>>>>>>>>>>>>> <https://issues.apache.org/
> > >>>> jira/browse/HADOOP-13433>
> > >>>>> -
> > >>>>>>> the
> > >>>>>>>>>> cited
> > >>>>>>>>>>>>> race
> > >>>>>>>>>>>>>>>> condition is in jdk.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Suggest pinging the reviewer on JIRA to get it
> > >>>>> moving.
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> bq. But the ugly code in HMaster is readlly a
> > >>>>>> problem...
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Can you be specific as to which code is ugly ? Is
> > >>>> it
> > >>>>> in
> > >>>>>>> the
> > >>>>>>>>>>> backup
> > >>>>>>>>>>>> /
> > >>>>>>>>>>>>>>>> restore mega patch ?
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Cheers
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 10:44 PM, 张铎 <
> > >>>>>>>> palomino219@gmail.com>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> If you guys have already implemented the feature
> > >>>> in
> > >>>>>> the
> > >>>>>>>> MR
> > >>>>>>>>>> way
> > >>>>>>>>>>>> and
> > >>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>> patch is ready for landing on master, I'm a -0 on
> > >>>>> it
> > >>>>>>> as I
> > >>>>>>>>> do
> > >>>>>>>>>>> not
> > >>>>>>>>>>>>> want
> > >>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>> block the development progress.
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> But I strongly suggest later we need to revisit
> > >>>> the
> > >>>>>>>> design
> > >>>>>>>>>> and
> > >>>>>>>>>>>> see
> > >>>>>>>>>>>>> if
> > >>>>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>>>> can seperated the logic from HMaster as much as
> > >>>>>>> possible.
> > >>>>>>>>> HA
> > >>>>>>>>>> is
> > >>>>>>>>>>>>> not a
> > >>>>>>>>>>>>>>> big
> > >>>>>>>>>>>>>>>>> problem if you do not store any metada locally.
> > >>>> But
> > >>>>>> the
> > >>>>>>>>> ugly
> > >>>>>>>>>>> code
> > >>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>> HMaster is readlly a problem...
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> And for security, I have a issue pending for a
> > >>>> long
> > >>>>>>> time.
> > >>>>>>>>> Can
> > >>>>>>>>>>>>> someone
> > >>>>>>>>>>>>>>>> help
> > >>>>>>>>>>>>>>>>> taking a simple look at it? This is what I mean,
> > >>>>> ugly
> > >>>>>>>>> code...
> > >>>>>>>>>>>>> logout
> > >>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>> destroy the credentials in a subject when it is
> > >>>>> still
> > >>>>>>>> being
> > >>>>>>>>>>> used,
> > >>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>> declared as LimitPrivacy so I can not change the
> > >>>>>>> behivor
> > >>>>>>>>> and
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>> only
> > >>>>>>>>>>>>>>> way
> > >>>>>>>>>>>>>>>>> to fix it is to write another piece of ugly
> > >>>> code...
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> https://issues.apache.org/
> > >>>> jira/browse/HADOOP-13433
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov <
> > >>>>>>>>>>>>> vladrodionov@gmail.com
> > >>>>>>>>>>>>>>> :
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> If in the future, we find better ways of
> > >>>> doing
> > >>>>>>> this
> > >>>>>>>>>>> without
> > >>>>>>>>>>>>>> using
> > >>>>>>>>>>>>>>>> MR,
> > >>>>>>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>>>>> can certainly consider that
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Our framework for distributed operations is
> > >>>>>> abstract
> > >>>>>>>> and
> > >>>>>>>>>>> allows
> > >>>>>>>>>>>>>>>>>> different implementations. MR is just one
> > >>>>>>>> implementation
> > >>>>>>>>> we
> > >>>>>>>>>>>>>> provide.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> -Vlad
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das <
> > >>>>>>>>>>>>> ddas@hortonworks.com
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Guys, first off apologies for bringing in the
> > >>>>>> topic
> > >>>>>>>> of
> > >>>>>>>>>>>> MR-based
> > >>>>>>>>>>>>>>>>>>> compactions.. But I was thinking more about
> > >>>> the
> > >>>>>>>>>>> SpliceMachine
> > >>>>>>>>>>>>>>>> approach
> > >>>>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>>> managing compactions in Spark where
> > >>>> apparently
> > >>>>>> they
> > >>>>>>>>> saw a
> > >>>>>>>>>>> lot
> > >>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>> benefits.
> > >>>>>>>>>>>>>>>>>>> Apologies for giving you that sore throat
> > >>>>>> Andrew; I
> > >>>>>>>>>> really
> > >>>>>>>>>>>>> didn't
> > >>>>>>>>>>>>>>>> mean
> > >>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>> :-)
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> So on this issue, we have these on the plate:
> > >>>>>>>>>>>>>>>>>>> 0. Somehow not use MR but something like that
> > >>>>>>>>>>>>>>>>>>> 1. Run a standalone service other than master
> > >>>>>>>>>>>>>>>>>>> 2. Shell out from the master
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> I don't think we have a good answer to (0),
> > >>>>> and I
> > >>>>>>>> don't
> > >>>>>>>>>>> think
> > >>>>>>>>>>>>>> it's
> > >>>>>>>>>>>>>>>> even
> > >>>>>>>>>>>>>>>>>>> worth the effort of trying to build something
> > >>>>>> when
> > >>>>>>> MR
> > >>>>>>>>> is
> > >>>>>>>>>>>>> already
> > >>>>>>>>>>>>>>>> there,
> > >>>>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>> being used by HBase already for some
> > >>>>> operations.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> On (1), we have to deal with a myriad of
> > >>>>> issues -
> > >>>>>>> HA
> > >>>>>>>> of
> > >>>>>>>>>> the
> > >>>>>>>>>>>>>> server
> > >>>>>>>>>>>>>>>> not
> > >>>>>>>>>>>>>>>>>>> being the least of them all. Security
> > >>>> (kerberos
> > >>>>>>>>>>>> authentication,
> > >>>>>>>>>>>>>>>> another
> > >>>>>>>>>>>>>>>>>>> keytab to manage, etc. etc. etc.). IMO, that
> > >>>>>>> approach
> > >>>>>>>>> is
> > >>>>>>>>>>> DOA.
> > >>>>>>>>>>>>>>> Instead
> > >>>>>>>>>>>>>>>>>> let's
> > >>>>>>>>>>>>>>>>>>> substitute that (1) with the HBase Master. I
> > >>>>>>> haven't
> > >>>>>>>>> seen
> > >>>>>>>>>>> any
> > >>>>>>>>>>>>>> good
> > >>>>>>>>>>>>>>>>> reason
> > >>>>>>>>>>>>>>>>>>> why the HBase master shouldn't launch MR jobs
> > >>>>> if
> > >>>>>>>>> needed.
> > >>>>>>>>>>> It's
> > >>>>>>>>>>>>> not
> > >>>>>>>>>>>>>>>>> ideal;
> > >>>>>>>>>>>>>>>>>>> agreed.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Now before going to (2), let's see what are
> > >>>> the
> > >>>>>>>>> benefits
> > >>>>>>>>>> of
> > >>>>>>>>>>>>>> running
> > >>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>> backup/restore jobs from the master. I think
> > >>>>> Ted
> > >>>>>>> has
> > >>>>>>>>>>>> summarized
> > >>>>>>>>>>>>>>> some
> > >>>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>> issues that we need to take care of -
> > >>>>> basically,
> > >>>>>>> the
> > >>>>>>>>>> master
> > >>>>>>>>>>>> can
> > >>>>>>>>>>>>>>> keep
> > >>>>>>>>>>>>>>>>>> track
> > >>>>>>>>>>>>>>>>>>> of running jobs, and should it fail, the
> > >>>> backup
> > >>>>>>>> master
> > >>>>>>>>>> can
> > >>>>>>>>>>>>>> continue
> > >>>>>>>>>>>>>>>>>> keeping
> > >>>>>>>>>>>>>>>>>>> track of it (since the jobId would have been
> > >>>>>>> recorded
> > >>>>>>>>> in
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>> proc
> > >>>>>>>>>>>>>>>> WAL).
> > >>>>>>>>>>>>>>>>>> The
> > >>>>>>>>>>>>>>>>>>> master can also do cleanup, etc. of failed
> > >>>>>>>>> backup/restore
> > >>>>>>>>>>>>>>> processes.
> > >>>>>>>>>>>>>>>>>>> Security is another issue - the job needs to
> > >>>>> run
> > >>>>>> as
> > >>>>>>>>>> 'hbase'
> > >>>>>>>>>>>>> since
> > >>>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>> owns
> > >>>>>>>>>>>>>>>>>>> the data. Having the master launch the job
> > >>>>> makes
> > >>>>>> it
> > >>>>>>>> get
> > >>>>>>>>>>> that
> > >>>>>>>>>>>>>>>> privilege.
> > >>>>>>>>>>>>>>>>>> In
> > >>>>>>>>>>>>>>>>>>> the (2) approach, it's hard to do some of the
> > >>>>>> above
> > >>>>>>>>>>>> management.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> Guys, just to reiterate, the patch as such is
> > >>>>>> ready
> > >>>>>>>>> from
> > >>>>>>>>>>> the
> > >>>>>>>>>>>>>>> overall
> > >>>>>>>>>>>>>>>>>>> design/arch point of view (maybe code review
> > >>>> is
> > >>>>>>> still
> > >>>>>>>>>>> pending
> > >>>>>>>>>>>>>> from
> > >>>>>>>>>>>>>>>>>> Matteo).
> > >>>>>>>>>>>>>>>>>>> If in the future, we find better ways of
> > >>>> doing
> > >>>>>> this
> > >>>>>>>>>> without
> > >>>>>>>>>>>>> using
> > >>>>>>>>>>>>>>> MR,
> > >>>>>>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>>>>>> can certainly consider that. But IMO don't
> > >>>>> think
> > >>>>>> we
> > >>>>>>>>>> should
> > >>>>>>>>>>>>> block
> > >>>>>>>>>>>>>>> this
> > >>>>>>>>>>>>>>>>>> patch
> > >>>>>>>>>>>>>>>>>>> from getting merged.
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> ________________________________________
> > >>>>>>>>>>>>>>>>>>> From: 张铎 <palomino219@gmail.com>
> > >>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22, 2016 8:32 PM
> > >>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org
> > >>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs started by
> > >>>>>> Master
> > >>>>>>>> or
> > >>>>>>>>> RS
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> So what about a standalone service other than
> > >>>>>>> master?
> > >>>>>>>>> You
> > >>>>>>>>>>> can
> > >>>>>>>>>>>>> use
> > >>>>>>>>>>>>>>>> your
> > >>>>>>>>>>>>>>>>>> own
> > >>>>>>>>>>>>>>>>>>> procedure store in that service?
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>> 2016-09-23 11:28 GMT+08:00 Ted Yu <
> > >>>>>>>> yuzhihong@gmail.com
> > >>>>>>>>>> :
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> An earlier implementation was client
> > >>>> driven.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> But with that approach, it is hard to
> > >>>> resume
> > >>>>> if
> > >>>>>>>> there
> > >>>>>>>>>> is
> > >>>>>>>>>>>>> error
> > >>>>>>>>>>>>>>>>> midway.
> > >>>>>>>>>>>>>>>>>>>> Using Procedure V2 makes the backup /
> > >>>> restore
> > >>>>>>> more
> > >>>>>>>>>>> robust.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Another consideration is for security. It
> > >>>> is
> > >>>>>> hard
> > >>>>>>>> to
> > >>>>>>>>>>>> enforce
> > >>>>>>>>>>>>>>>> security
> > >>>>>>>>>>>>>>>>>> (to
> > >>>>>>>>>>>>>>>>>>>> be implemented) for client driven actions.
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> Cheers
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 8:15 PM, Andrew
> > >>>>> Purtell <
> > >>>>>>>>>>>>>>>>>> andrew.purtell@gmail.com>
> > >>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> No, this misses Matteo's finer point,
> > >>>> which
> > >>>>>> is
> > >>>>>>>>>>> "shelling
> > >>>>>>>>>>>>> out"
> > >>>>>>>>>>>>>>>> from
> > >>>>>>>>>>>>>>>>>> the
> > >>>>>>>>>>>>>>>>>>>> master directly to run MR is a first. Why
> > >>>> not
> > >>>>>>> drive
> > >>>>>>>>>> this
> > >>>>>>>>>>>>> with a
> > >>>>>>>>>>>>>>>>> utility
> > >>>>>>>>>>>>>>>>>>>> derived from Tool?
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 7:57 PM, Vladimir
> > >>>>>> Rodionov
> > >>>>>>> <
> > >>>>>>>>>>>>>>>>>> vladrodionov@gmail.com
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> In our production cluster,  it is a
> > >>>>> common
> > >>>>>>>> case
> > >>>>>>>>> we
> > >>>>>>>>>>>> just
> > >>>>>>>>>>>>>> have
> > >>>>>>>>>>>>>>>>> HDFS
> > >>>>>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>>>>>> HBase deployed.
> > >>>>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR
> > >>>> framework
> > >>>>>>>>>> (especially
> > >>>>>>>>>>>> some
> > >>>>>>>>>>>>>>>>> features
> > >>>>>>>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>>>>>>>>>>> have not used at all),  it introduced
> > >>>>>>> another
> > >>>>>>>>> cost
> > >>>>>>>>>>> for
> > >>>>>>>>>>>>>>>> maintain.
> > >>>>>>>>>>>>>>>>>> I
> > >>>>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> So , you are not backup users in this
> > >>>>> case.
> > >>>>>>> Many
> > >>>>>>>>> our
> > >>>>>>>>>>>>>> customers
> > >>>>>>>>>>>>>>>>> have
> > >>>>>>>>>>>>>>>>>>> full
> > >>>>>>>>>>>>>>>>>>>>>> stack deployed and
> > >>>>>>>>>>>>>>>>>>>>>> want see backup to be a standard
> > >>>> feature.
> > >>>>>>>> Besides
> > >>>>>>>>>>> this,
> > >>>>>>>>>>>>>>> nothing
> > >>>>>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>>>>>>> happen
> > >>>>>>>>>>>>>>>>>>>>>> in your cluster
> > >>>>>>>>>>>>>>>>>>>>>> if you won't be doing backups.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> This discussion (we do not want see M/R
> > >>>>>>>>> dependency)
> > >>>>>>>>>>> goes
> > >>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>> nowhere.
> > >>>>>>>>>>>>>>>>>>> We
> > >>>>>>>>>>>>>>>>>>>>>> asked already, at least twice, to
> > >>>> suggest
> > >>>>>>>> another
> > >>>>>>>>>>>>> framework
> > >>>>>>>>>>>>>>>> (other
> > >>>>>>>>>>>>>>>>>>> than
> > >>>>>>>>>>>>>>>>>>>> M/R)
> > >>>>>>>>>>>>>>>>>>>>>> for bulk data copy with *conversion*.
> > >>>>> Still
> > >>>>>>>>> waiting
> > >>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>> suggestions.
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>> -Vlad
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:49 PM, Ted
> > >>>> Yu <
> > >>>>>>>>>>>>>> yuzhihong@gmail.com
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> If MR framework is not deployed in the
> > >>>>>>> cluster,
> > >>>>>>>>>> hbase
> > >>>>>>>>>>>>> still
> > >>>>>>>>>>>>>>>>>> functions
> > >>>>>>>>>>>>>>>>>>>>>>> normally (post merge).
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> In terms of build time dependency, we
> > >>>>> have
> > >>>>>>> long
> > >>>>>>>>>> been
> > >>>>>>>>>>>>>>> depending
> > >>>>>>>>>>>>>>>> on
> > >>>>>>>>>>>>>>>>>>>>>>> mapreduce. Take a look at
> > >>>> ExportSnapshot.
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> Cheers
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:42 PM, Heng
> > >>>>> Chen
> > >>>>>> <
> > >>>>>>>>>>>>>>>>>> heng.chen.1986@gmail.com
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> In our production cluster,  it is a
> > >>>>> common
> > >>>>>>>> case
> > >>>>>>>>> we
> > >>>>>>>>>>>> just
> > >>>>>>>>>>>>>> have
> > >>>>>>>>>>>>>>>>> HDFS
> > >>>>>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>>>>>> HBase deployed.
> > >>>>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR
> > >>>> framework
> > >>>>>>>>>> (especially
> > >>>>>>>>>>>> some
> > >>>>>>>>>>>>>>>>> features
> > >>>>>>>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>>>>>>>>>>> have not used at all),  it introduced
> > >>>>>>> another
> > >>>>>>>>> cost
> > >>>>>>>>>>> for
> > >>>>>>>>>>>>>>>> maintain.
> > >>>>>>>>>>>>>>>>>> I
> > >>>>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea.
> > >>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:28 GMT+08:00 张铎 <
> > >>>>>>>>>>> palomino219@gmail.com
> > >>>>>>>>>>>>> :
> > >>>>>>>>>>>>>>>>>>>>>>>>> To be specific, for example, our nice
> > >>>>>>>>>>> Backup/Restore
> > >>>>>>>>>>>>>>> feature,
> > >>>>>>>>>>>>>>>>> if
> > >>>>>>>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>>>>>>>>>> think
> > >>>>>>>>>>>>>>>>>>>>>>>>> this is not a core feature of HBase,
> > >>>>> then
> > >>>>>>> we
> > >>>>>>>>>> could
> > >>>>>>>>>>>> make
> > >>>>>>>>>>>>>> it
> > >>>>>>>>>>>>>>>>> depend
> > >>>>>>>>>>>>>>>>>>> on
> > >>>>>>>>>>>>>>>>>>>>>>> MR,
> > >>>>>>>>>>>>>>>>>>>>>>>>> and start a standalone BackupManager
> > >>>>>>> instance
> > >>>>>>>>>> that
> > >>>>>>>>>>>>>> submits
> > >>>>>>>>>>>>>>> MR
> > >>>>>>>>>>>>>>>>>> jobs
> > >>>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>>>> do
> > >>>>>>>>>>>>>>>>>>>>>>>>> periodical maintenance job. And if we
> > >>>>>> think
> > >>>>>>>>> this
> > >>>>>>>>>>> is a
> > >>>>>>>>>>>>>> core
> > >>>>>>>>>>>>>>>>>> feature
> > >>>>>>>>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>>>>>>>>>> everyone should use it, then we'd
> > >>>>> better
> > >>>>>>>>>> implement
> > >>>>>>>>>>> it
> > >>>>>>>>>>>>>>> without
> > >>>>>>>>>>>>>>>>> MR
> > >>>>>>>>>>>>>>>>>>>>>>>>> dependency, like DLS.
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> Thanks.
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:11 GMT+08:00 张铎 <
> > >>>>>>>>>>> palomino219@gmail.com
> > >>>>>>>>>>>>> :
> > >>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> I‘m -1 on let master or rs launch MR
> > >>>>>> jobs.
> > >>>>>>>> It
> > >>>>>>>>> is
> > >>>>>>>>>>> OK
> > >>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>> some
> > >>>>>>>>>>>>>>>>> of
> > >>>>>>>>>>>>>>>>>>> our
> > >>>>>>>>>>>>>>>>>>>>>>>>>> features depend on MR but I think
> > >>>> the
> > >>>>>>> bottom
> > >>>>>>>>>> line
> > >>>>>>>>>>> is
> > >>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>>>>>> should
> > >>>>>>>>>>>>>>>>>>>>>>>> launch
> > >>>>>>>>>>>>>>>>>>>>>>>>>> the jobs from outside manually or by
> > >>>>>> other
> > >>>>>>>>>>> services.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 9:47 GMT+08:00 Andrew
> > >>>>>> Purtell <
> > >>>>>>>>>>>>>>>>>>> andrew.purtell@gmail.com
> > >>>>>>>>>>>>>>>>>>>>> :
> > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Ok, got it. Well "shelling out" is
> > >>>> on
> > >>>>>> the
> > >>>>>>>>> line
> > >>>>>>>>>> I
> > >>>>>>>>>>>>> think,
> > >>>>>>>>>>>>>>> so
> > >>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>> fair
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> question.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Can this be driven by a utility
> > >>>>> derived
> > >>>>>>>> from
> > >>>>>>>>>> Tool
> > >>>>>>>>>>>>> like
> > >>>>>>>>>>>>>>> our
> > >>>>>>>>>>>>>>>>>> other
> > >>>>>>>>>>>>>>>>>>> MR
> > >>>>>>>>>>>>>>>>>>>>>>>> apps?
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> The issue is needing the
> > >>>>>> AccessController
> > >>>>>>>> to
> > >>>>>>>>>>> decide
> > >>>>>>>>>>>>> if
> > >>>>>>>>>>>>>>>>> allowed?
> > >>>>>>>>>>>>>>>>>>> But
> > >>>>>>>>>>>>>>>>>>>>>>>> nothing
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> prevents the user from running the
> > >>>>> job
> > >>>>>>>>>>>>>>>>> manually/independently,
> > >>>>>>>>>>>>>>>>>>>> right?
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 3:44 PM,
> > >>>> Matteo
> > >>>>>>>>> Bertozzi <
> > >>>>>>>>>>>>>>>>>>>>>>>> theo.bertozzi@gmail.com>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> just a remark. my query was not
> > >>>>> about
> > >>>>>>>> tools
> > >>>>>>>>>>> using
> > >>>>>>>>>>>> MR
> > >>>>>>>>>>>>>>>>>> (everyone i
> > >>>>>>>>>>>>>>>>>>>>>>>> think
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> is
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> ok with those).
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> the topic was about: "are we ok
> > >>>> with
> > >>>>>>>> running
> > >>>>>>>>>> MR
> > >>>>>>>>>>>> jobs
> > >>>>>>>>>>>>>>> from
> > >>>>>>>>>>>>>>>>>> Master
> > >>>>>>>>>>>>>>>>>>>>>>> and
> > >>>>>>>>>>>>>>>>>>>>>>>> RSs
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> code?" since this will be the
> > >>>> first
> > >>>>>> time
> > >>>>>>>> we
> > >>>>>>>>> do
> > >>>>>>>>>>>> this
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Matteo
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM,
> > >>>>>>> Devaraj
> > >>>>>>>>> Das
> > >>>>>>>>>> <
> > >>>>>>>>>>>>>>>>>>>>>>> ddas@hortonworks.com>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Very much agree; for tools like
> > >>>>>>>>>> ExportSnapshot
> > >>>>>>>>>>> /
> > >>>>>>>>>>>>>>> Backup /
> > >>>>>>>>>>>>>>>>>>>> Restore,
> > >>>>>>>>>>>>>>>>>>>>>>>> it's
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> fine to be dependent on MR. MR is
> > >>>>> the
> > >>>>>>>> right
> > >>>>>>>>>>>>> framework
> > >>>>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>>> such.
> > >>>>>>>>>>>>>>>>>>>> We
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> should
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> also do compactions using MR
> > >>>> (just
> > >>>>>>> saying
> > >>>>>>>>> :)
> > >>>>>>>>>> )
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ______________________________
> > >>>>>>> __________
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> From: Ted Yu <
> > >>>> yuzhihong@gmail.com>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22,
> > >>>> 2016
> > >>>>>> 2:00
> > >>>>>>>> PM
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs
> > >>>>>>> started
> > >>>>>>>>> by
> > >>>>>>>>>>>> Master
> > >>>>>>>>>>>>>> or
> > >>>>>>>>>>>>>>> RS
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree - backup / restore is in
> > >>>>> the
> > >>>>>>> same
> > >>>>>>>>>>>> category
> > >>>>>>>>>>>>> as
> > >>>>>>>>>>>>>>>>> import
> > >>>>>>>>>>>>>>>>>> /
> > >>>>>>>>>>>>>>>>>>>>>>>> export.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM,
> > >>>>>> Andrew
> > >>>>>>>>>>> Purtell <
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> andrew.purtell@gmail.com>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Backup is extra tooling around
> > >>>>> core
> > >>>>>> in
> > >>>>>>>> my
> > >>>>>>>>>>>> opinion.
> > >>>>>>>>>>>>>>> Like
> > >>>>>>>>>>>>>>>>>> import
> > >>>>>>>>>>>>>>>>>>>> or
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> export.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Or the optional MOB tool. It's
> > >>>>> fine.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 1:50 PM,
> > >>>>> Matteo
> > >>>>>>>>>> Bertozzi
> > >>>>>>>>>>> <
> > >>>>>>>>>>>>>>>>>>>>>>>> mbertozzi@apache.org>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What's the latest opinion
> > >>>> around
> > >>>>>>>> running
> > >>>>>>>>> MR
> > >>>>>>>>>>>> jobs
> > >>>>>>>>>>>>>> from
> > >>>>>>>>>>>>>>>>> hbase
> > >>>>>>>>>>>>>>>>>>>>>>>> (Master
> > >>>>>>>>>>>>>>>>>>>>>>>>>>> or
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> RS)?
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I remember in the past that
> > >>>> there
> > >>>>>> was
> > >>>>>>>>>>>> discussion
> > >>>>>>>>>>>>>>> about
> > >>>>>>>>>>>>>>>>> not
> > >>>>>>>>>>>>>>>>>>>>>>> having
> > >>>>>>>>>>>>>>>>>>>>>>>> MR
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> has
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> direct dependency of hbase.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think some of discussion
> > >>>> where
> > >>>>>>> around
> > >>>>>>>>> MOB
> > >>>>>>>>>>>> that
> > >>>>>>>>>>>>>> had
> > >>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>> MR
> > >>>>>>>>>>>>>>>>>> job
> > >>>>>>>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> compact,
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that later was transformed in a
> > >>>>>>> non-MR
> > >>>>>>>>> job
> > >>>>>>>>>> to
> > >>>>>>>>>>>> be
> > >>>>>>>>>>>>>>>> merged,
> > >>>>>>>>>>>>>>>>> I
> > >>>>>>>>>>>>>>>>>>>> think
> > >>>>>>>>>>>>>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> had a
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> similar discussion for log
> > >>>>>>>> split/replay.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the latest is the new Backup
> > >>>>>> feature
> > >>>>>>>>>>>>> (HBASE-7912),
> > >>>>>>>>>>>>>>> that
> > >>>>>>>>>>>>>>>>>> runs
> > >>>>>>>>>>>>>>>>>>> a
> > >>>>>>>>>>>>>>>>>>>>>>> MR
> > >>>>>>>>>>>>>>>>>>>>>>>> job
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the master to copy data or
> > >>>>> restore
> > >>>>>>>> data.
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (backup is also "not really
> > >>>> core"
> > >>>>>> as
> > >>>>>>>> in..
> > >>>>>>>>>> if
> > >>>>>>>>>>>> you
> > >>>>>>>>>>>>>>> don't
> > >>>>>>>>>>>>>>>>> use
> > >>>>>>>>>>>>>>>>>>>>>>> backup
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> you'll
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not end up running MR jobs, but
> > >>>>>> this
> > >>>>>>>> was
> > >>>>>>>>>>>> probably
> > >>>>>>>>>>>>>>> true
> > >>>>>>>>>>>>>>>>> for
> > >>>>>>>>>>>>>>>>>>> MOB
> > >>>>>>>>>>>>>>>>>>>>>>> as
> > >>>>>>>>>>>>>>>>>>>>>>>> in
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> "if
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you don't enable MOB you don't
> > >>>>> need
> > >>>>>>>> MR")
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> any thoughts? do we a rule that
> > >>>>>> says
> > >>>>>>>> "we
> > >>>>>>>>>>> don't
> > >>>>>>>>>>>>> want
> > >>>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>>>> have
> > >>>>>>>>>>>>>>>>>>>>>>> hbase
> > >>>>>>>>>>>>>>>>>>>>>>>> run
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> MR
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs, only tool started
> > >>>> manually
> > >>>>> by
> > >>>>>>> the
> > >>>>>>>>>> user
> > >>>>>>>>>>>> can
> > >>>>>>>>>>>>> do
> > >>>>>>>>>>>>>>>>> that".
> > >>>>>>>>>>>>>>>>>> or
> > >>>>>>>>>>>>>>>>>>>>>>> can
> > >>>>>>>>>>>>>>>>>>>>>>>> we
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> start
> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> adding MR calls around without
> > >>>>>>>> problems?
> > >>>>>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message