hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
Date Sat, 24 Sep 2016 19:11:15 GMT
The standalone service so far seems to be middle ground having the
following advantages:

1. utilization of existing proc V2 framework for fault tolerance
2. friendliness to security support to be implemented in the next phase -
security is hard to enforce from client side
3. not introducing MR calls in master or region servers

Cheers


On Sat, Sep 24, 2016 at 11:26 AM, Vladimir Rodionov <vladrodionov@gmail.com>
wrote:

> >> So the standalone service would run out of proc - in the same vein as
> REST
> or thrift server.
>
> Ted, running separate process/service to coordinate backups is not a good
> idea. We have already a lot of them.
>
> On Sat, Sep 24, 2016 at 11:20 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > bq. don't call out to an external framework we don't own from master (or
> > regionserver) code
> >
> > So the standalone service would run out of proc - in the same vein as
> REST
> > or thrift server.
> >
> > Cheers
> >
> > On Sat, Sep 24, 2016 at 10:40 AM, Andrew Purtell <
> andrew.purtell@gmail.com
> > >
> > wrote:
> >
> > > I was attempting to summarize Ted.
> > >
> > > A new maven module sounds like a good idea to me. Or we could move all
> > the
> > > tools that use MR out to one. Or...
> > >
> > > The key takeaway seems to be don't call out to an external framework we
> > > don't own from master (or regionserver) code.
> > >
> > > > On Sep 24, 2016, at 10:15 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> > > >
> > > > bq. Internally the tool can also use the procedure framework for
> state
> > > > durability
> > > >
> > > > Isn't this the standalone service I proposed this morning ?
> > > >
> > > > bq. Move cross HBase and MR coordination to a separate tool
> > > >
> > > > Where should this tool live (hbase-backup module) ?
> > > >
> > > > Thanks
> > > >
> > > >
> > > > On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell <
> > > andrew.purtell@gmail.com>
> > > > wrote:
> > > >
> > > >> At branch merge voting time now more eyes are getting on the design
> > > issues
> > > >> with dissenting opinion emerging. This is the branch merge process
> > > working
> > > >> as our community has designed it. Because this is the first full
> > project
> > > >> review of the code and implementation I think we all have to be
> > > flexible. I
> > > >> see the community as trying to narrow the technical objection at
> issue
> > > to
> > > >> the smallest possible scope. It's simple: don't call out to an
> > external
> > > >> execution framework we don't own from core master (and by extension
> > > >> regionserver) code. We had this objection before to a proposed
> > external
> > > >> compaction implementation for
> > > >> MOB so should not come as a surprise. Please let me know if I have
> > > >> misstated this.
> > > >>
> > > >> This would seem to require a modest refactor of coordination to move
> > > >> invocation of MR code out from any core code path. To restate what I
> > > think
> > > >> is an emerging recommendation: Move cross HBase and MR coordination
> > to a
> > > >> separate tool. This tool can ask the master to invoke procedures on
> > the
> > > >> HBase side that do first mile export and last mile restore.
> > (Internally
> > > the
> > > >> tool can also use the procedure framework for state durability,
> > perhaps,
> > > >> just a thought.) Then the tool can further drive the things done
> with
> > MR
> > > >> like shipping data off cluster or moving remote data in place and
> > > preparing
> > > >> it for import. These activities do not need procedure coordination
> and
> > > >> involvement of the HBase master. Only the first and last mile of the
> > > >> process needs atomicity within the HBase deploy. Please let me know
> > if I
> > > >> have misstated this.
> > > >>
> > > >>
> > > >>> On Sep 24, 2016, at 8:17 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> > > >>>
> > > >>> bq. procedure gives you a retry mechanism on failure
> > > >>>
> > > >>> We do need this mechanism. Take a look at the multi-step
> > > >>> in FullTableBackupProcedure, etc.
> > > >>>
> > > >>> bq. let the user export it later when he wants
> > > >>>
> > > >>> This would make supporting security more complex (user A shouldn't
> be
> > > >>> exporting user B's backup). And it is not user friendly - at the
> time
> > > >>> backup request is issued, the following is specified:
> > > >>>
> > > >>> +          + " BACKUP_ROOT     The full root path to store the
> backup
> > > >>> image,\n"
> > > >>> +          + "                 the prefix can be hdfs, webhdfs or
> > > gpfs\n"
> > > >>>
> > > >>> Backup root is an integral part of backup manifest.
> > > >>>
> > > >>> Cheers
> > > >>>
> > > >>>
> > > >>> On Sat, Sep 24, 2016 at 7:59 AM, Matteo Bertozzi <
> > > >> theo.bertozzi@gmail.com>
> > > >>> wrote:
> > > >>>
> > > >>>>> On Sat, Sep 24, 2016 at 7:19 AM, Ted Yu <yuzhihong@gmail.com>
> > wrote:
> > > >>>>>
> > > >>>>> Ideally the export should have one job running which does the
> retry
> > > (on
> > > >>>>> failed partition) itself.
> > > >>>>>
> > > >>>>
> > > >>>> procedure gives you a retry mechanism on failure. if you don't use
> > > that,
> > > >>>> than you don't need procedure.
> > > >>>> if you want you can start a procedure executor in a non master
> > process
> > > >> (the
> > > >>>> hbase-procedure is a separate package and does not depend on
> > master).
> > > >> but
> > > >>>> again, export seems a case where you don't need procedure.
> > > >>>>
> > > >>>> like snapshot, the logic may just be: ask the master to take a
> > backup.
> > > >> and
> > > >>>> let the user export it later when he wants. so you avoid having a
> MR
> > > job
> > > >>>> started by the master since people does not seems to like it.
> > > >>>>
> > > >>>> for restore (I think that is where you use the MR splitter) you
> can
> > > >>>> probably just have a backup ready (already splitted). there is
> > > already a
> > > >>>> jira that should do that HBASE-14135. instead of doing the
> operation
> > > of
> > > >>>> split/merge on restore. you consolidate the backup "offline" (mr
> job
> > > >>>> started by the user) and then ask to restore the backup.
> > > >>>>
> > > >>>>
> > > >>>>>
> > > >>>>> On Sat, Sep 24, 2016 at 7:04 AM, Matteo Bertozzi <
> > > >>>> theo.bertozzi@gmail.com>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> as far as I understand the code, you don't need procedure for
> the
> > > >>>> export
> > > >>>>>> itself.
> > > >>>>>> the export operation is already idempotent, since you are just
> > > copying
> > > >>>>>> files.
> > > >>>>>> if the file exist and is complete (check length, checksum, ...)
> > you
> > > >> can
> > > >>>>>> skip it,
> > > >>>>>> otherwise you'll send it over again.
> > > >>>>>>
> > > >>>>>> you need the proc for taking the backup and restoring,
> > > >>>>>> because you want to complete the operation and end up with a
> > > >> consistent
> > > >>>>>> state
> > > >>>>>> across the multiple components you are updating (meta, fs, ...)
> > > >>>>>> but again, for export you can just run the tool over and over
> > until
> > > >> the
> > > >>>>>> operation succeed, and that should be ok.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> Matteo
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>> On Sat, Sep 24, 2016 at 6:54 AM, Ted Yu <yuzhihong@gmail.com>
> > > wrote:
> > > >>>>>>>
> > > >>>>>>> Master is involved in this discussion because currently only
> > Master
> > > >>>>>>> instantiates ProcedureExecutor which runs the 3 Procedures for
> > > >>>> backup /
> > > >>>>>>> restore.
> > > >>>>>>>
> > > >>>>>>> What if an optional standalone service which hosts
> > > ProcedureExecutor
> > > >>>> is
> > > >>>>>>> used for this purpose ?
> > > >>>>>>> Would that have better chance of giving us middle ground so
> that
> > we
> > > >>>> can
> > > >>>>>>> move this forward ?
> > > >>>>>>>
> > > >>>>>>> Cheers
> > > >>>>>>>
> > > >>>>>>>> On Fri, Sep 23, 2016 at 5:15 PM, Stack <stack@duboce.net>
> > wrote:
> > > >>>>>>>>
> > > >>>>>>>> (Moved out of the Master doing MR DISCUSSION)
> > > >>>>>>>>
> > > >>>>>>>> On Fri, Sep 23, 2016 at 12:24 PM, Vladimir Rodionov <
> > > >>>>>>>> vladrodionov@gmail.com>
> > > >>>>>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>>>> -1 on that backup be in core hbase
> > > >>>>>>>>>
> > > >>>>>>>>> Not sure I understand what it means.
> > > >>>>>>>>>
> > > >>>>>>>>> Sorry for the imprecision.
> > > >>>>>>>>
> > > >>>>>>>> The -1 is NOT against backup/restore. I am -1 on MR as a
> > > dependency
> > > >>>>> and
> > > >>>>>>> so
> > > >>>>>>>> -1 on the Master running backup/restore MR jobs, even if
> > optional.
> > > >>>>>>>>
> > > >>>>>>>> Master should not depend on MR. We've gone out of our way to
> > avoid
> > > >>>>>> taking
> > > >>>>>>>> MR on as dependency in the past. Seems late in the game for us
> > to
> > > >>>>>> change
> > > >>>>>>>> our opinion on this. If we didn't do it for distributed log
> > > >>>>> splitting,
> > > >>>>>> or
> > > >>>>>>>> MOB, why would we do it to support an optional backup/restore?
> > > >>>>>>>>
> > > >>>>>>>> I have opinions on the questions below -- i.e. that Master
> > running
> > > >>>>>>>> backup/restore is outside of the Master's charge -- but they
> are
> > > >>>> not
> > > >>>>>>> worth
> > > >>>>>>>> much since I've not done much by way of review or contrib to
> > > >>>>>>> backup/restore
> > > >>>>>>>> other than to try it as a 'user' so I'll keep them to myself
> > until
> > > >>>> I
> > > >>>>>> do.
> > > >>>>>>> I
> > > >>>>>>>> only came out from under my shell to participate on the MR as
> > > >>>>>> dependency
> > > >>>>>>>> chat.
> > > >>>>>>>>
> > > >>>>>>>> Thanks,
> > > >>>>>>>> M
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> 1. We are not allowed to use Master to orchestrate the whole
> > > >>>> process?
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> We
> > > >>>>>>>>> have already brought up all advantages of using
> > > >>>>>>>>>  Master and distributed procedures for backup and restore.
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> Downside of moving this to client tool is lack of fault
> > > >>>> tolerance:
> > > >>>>>>>>> 1.1 Client won't be allowed to do any operations, that can,
> > > >>>>>>> potentially
> > > >>>>>>>>> affect
> > > >>>>>>>>> cluster, such as disabling splits/merges, balancer.
> > > >>>>>>>>> 1.2 In case of client failure who will be doing the whole
> > > >>>> rollback
> > > >>>>>>>> stuff?
> > > >>>>>>>>> We are trying to make it atomic.
> > > >>>>>>>>>
> > > >>>>>>>>> Security is not clear.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> 2. We are not allowed to modify code of existing HBase core
> > > classes
> > > >>>>>> (what
> > > >>>>>>>>> does core mean anyway)?
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>> 3. We are not allowed to create backup system table
> > > >>>> (hbase:backup)
> > > >>>>>> in a
> > > >>>>>>>>> system space? Only in user space? The table is global.
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>> 2. is critical. Despite the fact, that 95% of code is new, we
> > > >>>> have
> > > >>>>>>>> touched,
> > > >>>>>>>>> of course some existing HBase code.
> > > >>>>>>>>> 3. is not that critical, of course we can move backup system
> > into
> > > >>>>>> user
> > > >>>>>>>>> space.
> > > >>>>>>>>>
> > > >>>>>>>>> And finally, will moving backup into external tool give us +1
> > > >>>> from
> > > >>>>>>> stack?
> > > >>>>>>>>>
> > > >>>>>>>>> -Vlad
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> On Fri, Sep 23, 2016 at 11:26 AM, Stack <stack@duboce.net>
> > > >>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> On Fri, Sep 23, 2016 at 11:22 AM, Vladimir Rodionov <
> > > >>>>>>>>>> vladrodionov@gmail.com>
> > > >>>>>>>>>> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>>>> + MR is dead
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Does MR know that? :)
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Again. With all due respect, stack - still no suggestions
> > > >>>> what
> > > >>>>>>> should
> > > >>>>>>>>> we
> > > >>>>>>>>>>> use for "bulk data move and transformation" instead of MR?
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> Use whatever distributed engine suits your fancy -- MR,
> Spark,
> > > >>>>>>>>> distributed
> > > >>>>>>>>>> shell -- just don't have HBase core depend on it, even
> > > >>>>> optionally.
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>> I suggest voting first on "do we need backup in HBase"? In
> my
> > > >>>>>>>> opinion,
> > > >>>>>>>>>> some
> > > >>>>>>>>>>> group members still not sure about that and some will give
> -1
> > > >>>>>>>>>>> in any case. Just because ...
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>> We could run a vote, sure. -1 on that backup be in core
> hbase
> > > >>>> (+1
> > > >>>>>> on
> > > >>>>>>>>> adding
> > > >>>>>>>>>> all the API any such external tool might need to run).
> > > >>>>>>>>>>
> > > >>>>>>>>>> St.Ack
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>> -Vlad
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On Fri, Sep 23, 2016 at 10:57 AM, Stack <stack@duboce.net>
> > > >>>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> On Fri, Sep 23, 2016 at 6:46 AM, Matteo Bertozzi <
> > > >>>>>>>>>>> theo.bertozzi@gmail.com>
> > > >>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> let me try to go back to my original topic.
> > > >>>>>>>>>>>>> this question was meant to be generic, and provide some
> > > >>>>> rule
> > > >>>>>>> for
> > > >>>>>>>>>> future
> > > >>>>>>>>>>>>> code.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> from what I can gather, a rule that may satisfy everyone
> > > >>>>> can
> > > >>>>>>> be:
> > > >>>>>>>>>>>>> - we don't want any core feature (e.g.
> > > >>>>>>> compaction/log-split/log-
> > > >>>>>>>>>>> reply)
> > > >>>>>>>>>>>>> over MR, because some cluster may not want or may have an
> > > >>>>>>>>>>>>> external/uncontrolled MR setup.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> +1
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> - we allow non-core features (e.g. features enabled by a
> > > >>>>>> flag)
> > > >>>>>>>> to
> > > >>>>>>>>>> run
> > > >>>>>>>>>>> MR
> > > >>>>>>>>>>>>> jobs from hbase, because unless you use the feature, MR
> > > >>>> is
> > > >>>>>> not
> > > >>>>>>>>>>> required.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>> -1 to hbase core depending on MR or core -- whether behind
> > > >>>> a
> > > >>>>>> flag
> > > >>>>>>>> or
> > > >>>>>>>>>> not
> > > >>>>>>>>>>> --
> > > >>>>>>>>>>>> ever being able to launch MR jobs.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> + MR is dead. We should be busy working hard to undo it
> > > >>>> from
> > > >>>>>>>>>> hbase-server
> > > >>>>>>>>>>>> moving it out to be an optional module (Spark would be its
> > > >>>>>> peer).
> > > >>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and
> Appy
> > > >>>>> are
> > > >>>>>>>> busy
> > > >>>>>>>>>>>> working hard on moving it up on to a new foundation. Lets
> > > >>>> not
> > > >>>>>>>> clutter
> > > >>>>>>>>>>> task
> > > >>>>>>>>>>>> harder by piling on more moving parts.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> St.Ack
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> Matteo
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu <
> > > >>>>> yuzhihong@gmail.com
> > > >>>>>>>
> > > >>>>>>>>> wrote:
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> I suggest you look at Matteo's work for
> > > >>>> AssignmentManager
> > > >>>>>>> which
> > > >>>>>>>>> is
> > > >>>>>>>>>> to
> > > >>>>>>>>>>>>> make
> > > >>>>>>>>>>>>>> Master more stable.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Cheers
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 <
> > > >>>>> palomino219@gmail.com
> > > >>>>>>>
> > > >>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> No, not your fault, at lease, not this time:)
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the
> > > >>>>>>> sequence
> > > >>>>>>>>> of
> > > >>>>>>>>>>>> calls
> > > >>>>>>>>>>>>>> when
> > > >>>>>>>>>>>>>>> starting up the HMaster? HMaster is also a
> > > >>>> regionserver
> > > >>>>>> so
> > > >>>>>>> it
> > > >>>>>>>>>>> extends
> > > >>>>>>>>>>>>>>> HRegionServer, and the initialization of
> > > >>>> HRegionServer
> > > >>>>>>>>> sometimes
> > > >>>>>>>>>>>> needs
> > > >>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>> make rpc calls to HMaster. A simple change would
> > > >>>> cause
> > > >>>>>>>>>>> probabilistic
> > > >>>>>>>>>>>>> dead
> > > >>>>>>>>>>>>>>> lock or some strange NPEs...
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> That's why I'm very nervous when somebody wants to
> > > >>>> add
> > > >>>>>> new
> > > >>>>>>>>>> features
> > > >>>>>>>>>>>> or
> > > >>>>>>>>>>>>>> add
> > > >>>>>>>>>>>>>>> external dependencies to HMaster, especially add more
> > > >>>>>> works
> > > >>>>>>>> for
> > > >>>>>>>>>> the
> > > >>>>>>>>>>>>> start
> > > >>>>>>>>>>>>>>> up processing...
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Thanks.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> 2016-09-23 20:02 GMT+08:00 Ted Yu <
> > > >>>> yuzhihong@gmail.com
> > > >>>>>> :
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> I read through HADOOP-13433
> > > >>>>>>>>>>>>>>>> <https://issues.apache.org/
> > > >>>> jira/browse/HADOOP-13433>
> > > >>>>> -
> > > >>>>>>> the
> > > >>>>>>>>>> cited
> > > >>>>>>>>>>>>> race
> > > >>>>>>>>>>>>>>>> condition is in jdk.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Suggest pinging the reviewer on JIRA to get it
> > > >>>>> moving.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> bq. But the ugly code in HMaster is readlly a
> > > >>>>>> problem...
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Can you be specific as to which code is ugly ? Is
> > > >>>> it
> > > >>>>> in
> > > >>>>>>> the
> > > >>>>>>>>>>> backup
> > > >>>>>>>>>>>> /
> > > >>>>>>>>>>>>>>>> restore mega patch ?
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Cheers
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 10:44 PM, 张铎 <
> > > >>>>>>>> palomino219@gmail.com>
> > > >>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> If you guys have already implemented the feature
> > > >>>> in
> > > >>>>>> the
> > > >>>>>>>> MR
> > > >>>>>>>>>> way
> > > >>>>>>>>>>>> and
> > > >>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>> patch is ready for landing on master, I'm a -0 on
> > > >>>>> it
> > > >>>>>>> as I
> > > >>>>>>>>> do
> > > >>>>>>>>>>> not
> > > >>>>>>>>>>>>> want
> > > >>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>> block the development progress.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> But I strongly suggest later we need to revisit
> > > >>>> the
> > > >>>>>>>> design
> > > >>>>>>>>>> and
> > > >>>>>>>>>>>> see
> > > >>>>>>>>>>>>> if
> > > >>>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>>> can seperated the logic from HMaster as much as
> > > >>>>>>> possible.
> > > >>>>>>>>> HA
> > > >>>>>>>>>> is
> > > >>>>>>>>>>>>> not a
> > > >>>>>>>>>>>>>>> big
> > > >>>>>>>>>>>>>>>>> problem if you do not store any metada locally.
> > > >>>> But
> > > >>>>>> the
> > > >>>>>>>>> ugly
> > > >>>>>>>>>>> code
> > > >>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>>> HMaster is readlly a problem...
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> And for security, I have a issue pending for a
> > > >>>> long
> > > >>>>>>> time.
> > > >>>>>>>>> Can
> > > >>>>>>>>>>>>> someone
> > > >>>>>>>>>>>>>>>> help
> > > >>>>>>>>>>>>>>>>> taking a simple look at it? This is what I mean,
> > > >>>>> ugly
> > > >>>>>>>>> code...
> > > >>>>>>>>>>>>> logout
> > > >>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>> destroy the credentials in a subject when it is
> > > >>>>> still
> > > >>>>>>>> being
> > > >>>>>>>>>>> used,
> > > >>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>> declared as LimitPrivacy so I can not change the
> > > >>>>>>> behivor
> > > >>>>>>>>> and
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>> only
> > > >>>>>>>>>>>>>>> way
> > > >>>>>>>>>>>>>>>>> to fix it is to write another piece of ugly
> > > >>>> code...
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> https://issues.apache.org/
> > > >>>> jira/browse/HADOOP-13433
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov <
> > > >>>>>>>>>>>>> vladrodionov@gmail.com
> > > >>>>>>>>>>>>>>> :
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> If in the future, we find better ways of
> > > >>>> doing
> > > >>>>>>> this
> > > >>>>>>>>>>> without
> > > >>>>>>>>>>>>>> using
> > > >>>>>>>>>>>>>>>> MR,
> > > >>>>>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>>>> can certainly consider that
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Our framework for distributed operations is
> > > >>>>>> abstract
> > > >>>>>>>> and
> > > >>>>>>>>>>> allows
> > > >>>>>>>>>>>>>>>>>> different implementations. MR is just one
> > > >>>>>>>> implementation
> > > >>>>>>>>> we
> > > >>>>>>>>>>>>>> provide.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> -Vlad
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das <
> > > >>>>>>>>>>>>> ddas@hortonworks.com
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> Guys, first off apologies for bringing in the
> > > >>>>>> topic
> > > >>>>>>>> of
> > > >>>>>>>>>>>> MR-based
> > > >>>>>>>>>>>>>>>>>>> compactions.. But I was thinking more about
> > > >>>> the
> > > >>>>>>>>>>> SpliceMachine
> > > >>>>>>>>>>>>>>>> approach
> > > >>>>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>>> managing compactions in Spark where
> > > >>>> apparently
> > > >>>>>> they
> > > >>>>>>>>> saw a
> > > >>>>>>>>>>> lot
> > > >>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>> benefits.
> > > >>>>>>>>>>>>>>>>>>> Apologies for giving you that sore throat
> > > >>>>>> Andrew; I
> > > >>>>>>>>>> really
> > > >>>>>>>>>>>>> didn't
> > > >>>>>>>>>>>>>>>> mean
> > > >>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>> :-)
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> So on this issue, we have these on the plate:
> > > >>>>>>>>>>>>>>>>>>> 0. Somehow not use MR but something like that
> > > >>>>>>>>>>>>>>>>>>> 1. Run a standalone service other than master
> > > >>>>>>>>>>>>>>>>>>> 2. Shell out from the master
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> I don't think we have a good answer to (0),
> > > >>>>> and I
> > > >>>>>>>> don't
> > > >>>>>>>>>>> think
> > > >>>>>>>>>>>>>> it's
> > > >>>>>>>>>>>>>>>> even
> > > >>>>>>>>>>>>>>>>>>> worth the effort of trying to build something
> > > >>>>>> when
> > > >>>>>>> MR
> > > >>>>>>>>> is
> > > >>>>>>>>>>>>> already
> > > >>>>>>>>>>>>>>>> there,
> > > >>>>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>> being used by HBase already for some
> > > >>>>> operations.
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> On (1), we have to deal with a myriad of
> > > >>>>> issues -
> > > >>>>>>> HA
> > > >>>>>>>> of
> > > >>>>>>>>>> the
> > > >>>>>>>>>>>>>> server
> > > >>>>>>>>>>>>>>>> not
> > > >>>>>>>>>>>>>>>>>>> being the least of them all. Security
> > > >>>> (kerberos
> > > >>>>>>>>>>>> authentication,
> > > >>>>>>>>>>>>>>>> another
> > > >>>>>>>>>>>>>>>>>>> keytab to manage, etc. etc. etc.). IMO, that
> > > >>>>>>> approach
> > > >>>>>>>>> is
> > > >>>>>>>>>>> DOA.
> > > >>>>>>>>>>>>>>> Instead
> > > >>>>>>>>>>>>>>>>>> let's
> > > >>>>>>>>>>>>>>>>>>> substitute that (1) with the HBase Master. I
> > > >>>>>>> haven't
> > > >>>>>>>>> seen
> > > >>>>>>>>>>> any
> > > >>>>>>>>>>>>>> good
> > > >>>>>>>>>>>>>>>>> reason
> > > >>>>>>>>>>>>>>>>>>> why the HBase master shouldn't launch MR jobs
> > > >>>>> if
> > > >>>>>>>>> needed.
> > > >>>>>>>>>>> It's
> > > >>>>>>>>>>>>> not
> > > >>>>>>>>>>>>>>>>> ideal;
> > > >>>>>>>>>>>>>>>>>>> agreed.
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> Now before going to (2), let's see what are
> > > >>>> the
> > > >>>>>>>>> benefits
> > > >>>>>>>>>> of
> > > >>>>>>>>>>>>>> running
> > > >>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>> backup/restore jobs from the master. I think
> > > >>>>> Ted
> > > >>>>>>> has
> > > >>>>>>>>>>>> summarized
> > > >>>>>>>>>>>>>>> some
> > > >>>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>> issues that we need to take care of -
> > > >>>>> basically,
> > > >>>>>>> the
> > > >>>>>>>>>> master
> > > >>>>>>>>>>>> can
> > > >>>>>>>>>>>>>>> keep
> > > >>>>>>>>>>>>>>>>>> track
> > > >>>>>>>>>>>>>>>>>>> of running jobs, and should it fail, the
> > > >>>> backup
> > > >>>>>>>> master
> > > >>>>>>>>>> can
> > > >>>>>>>>>>>>>> continue
> > > >>>>>>>>>>>>>>>>>> keeping
> > > >>>>>>>>>>>>>>>>>>> track of it (since the jobId would have been
> > > >>>>>>> recorded
> > > >>>>>>>>> in
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>> proc
> > > >>>>>>>>>>>>>>>> WAL).
> > > >>>>>>>>>>>>>>>>>> The
> > > >>>>>>>>>>>>>>>>>>> master can also do cleanup, etc. of failed
> > > >>>>>>>>> backup/restore
> > > >>>>>>>>>>>>>>> processes.
> > > >>>>>>>>>>>>>>>>>>> Security is another issue - the job needs to
> > > >>>>> run
> > > >>>>>> as
> > > >>>>>>>>>> 'hbase'
> > > >>>>>>>>>>>>> since
> > > >>>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>> owns
> > > >>>>>>>>>>>>>>>>>>> the data. Having the master launch the job
> > > >>>>> makes
> > > >>>>>> it
> > > >>>>>>>> get
> > > >>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>> privilege.
> > > >>>>>>>>>>>>>>>>>> In
> > > >>>>>>>>>>>>>>>>>>> the (2) approach, it's hard to do some of the
> > > >>>>>> above
> > > >>>>>>>>>>>> management.
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> Guys, just to reiterate, the patch as such is
> > > >>>>>> ready
> > > >>>>>>>>> from
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>>> overall
> > > >>>>>>>>>>>>>>>>>>> design/arch point of view (maybe code review
> > > >>>> is
> > > >>>>>>> still
> > > >>>>>>>>>>> pending
> > > >>>>>>>>>>>>>> from
> > > >>>>>>>>>>>>>>>>>> Matteo).
> > > >>>>>>>>>>>>>>>>>>> If in the future, we find better ways of
> > > >>>> doing
> > > >>>>>> this
> > > >>>>>>>>>> without
> > > >>>>>>>>>>>>> using
> > > >>>>>>>>>>>>>>> MR,
> > > >>>>>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>>>>> can certainly consider that. But IMO don't
> > > >>>>> think
> > > >>>>>> we
> > > >>>>>>>>>> should
> > > >>>>>>>>>>>>> block
> > > >>>>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>>>> patch
> > > >>>>>>>>>>>>>>>>>>> from getting merged.
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> ________________________________________
> > > >>>>>>>>>>>>>>>>>>> From: 张铎 <palomino219@gmail.com>
> > > >>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22, 2016 8:32 PM
> > > >>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org
> > > >>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs started by
> > > >>>>>> Master
> > > >>>>>>>> or
> > > >>>>>>>>> RS
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> So what about a standalone service other than
> > > >>>>>>> master?
> > > >>>>>>>>> You
> > > >>>>>>>>>>> can
> > > >>>>>>>>>>>>> use
> > > >>>>>>>>>>>>>>>> your
> > > >>>>>>>>>>>>>>>>>> own
> > > >>>>>>>>>>>>>>>>>>> procedure store in that service?
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> 2016-09-23 11:28 GMT+08:00 Ted Yu <
> > > >>>>>>>> yuzhihong@gmail.com
> > > >>>>>>>>>> :
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> An earlier implementation was client
> > > >>>> driven.
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> But with that approach, it is hard to
> > > >>>> resume
> > > >>>>> if
> > > >>>>>>>> there
> > > >>>>>>>>>> is
> > > >>>>>>>>>>>>> error
> > > >>>>>>>>>>>>>>>>> midway.
> > > >>>>>>>>>>>>>>>>>>>> Using Procedure V2 makes the backup /
> > > >>>> restore
> > > >>>>>>> more
> > > >>>>>>>>>>> robust.
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> Another consideration is for security. It
> > > >>>> is
> > > >>>>>> hard
> > > >>>>>>>> to
> > > >>>>>>>>>>>> enforce
> > > >>>>>>>>>>>>>>>> security
> > > >>>>>>>>>>>>>>>>>> (to
> > > >>>>>>>>>>>>>>>>>>>> be implemented) for client driven actions.
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> Cheers
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 8:15 PM, Andrew
> > > >>>>> Purtell <
> > > >>>>>>>>>>>>>>>>>> andrew.purtell@gmail.com>
> > > >>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> No, this misses Matteo's finer point,
> > > >>>> which
> > > >>>>>> is
> > > >>>>>>>>>>> "shelling
> > > >>>>>>>>>>>>> out"
> > > >>>>>>>>>>>>>>>> from
> > > >>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>> master directly to run MR is a first. Why
> > > >>>> not
> > > >>>>>>> drive
> > > >>>>>>>>>> this
> > > >>>>>>>>>>>>> with a
> > > >>>>>>>>>>>>>>>>> utility
> > > >>>>>>>>>>>>>>>>>>>> derived from Tool?
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 7:57 PM, Vladimir
> > > >>>>>> Rodionov
> > > >>>>>>> <
> > > >>>>>>>>>>>>>>>>>> vladrodionov@gmail.com
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> In our production cluster,  it is a
> > > >>>>> common
> > > >>>>>>>> case
> > > >>>>>>>>> we
> > > >>>>>>>>>>>> just
> > > >>>>>>>>>>>>>> have
> > > >>>>>>>>>>>>>>>>> HDFS
> > > >>>>>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>>>>>>> HBase deployed.
> > > >>>>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR
> > > >>>> framework
> > > >>>>>>>>>> (especially
> > > >>>>>>>>>>>> some
> > > >>>>>>>>>>>>>>>>> features
> > > >>>>>>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>>>>>>>>>> have not used at all),  it introduced
> > > >>>>>>> another
> > > >>>>>>>>> cost
> > > >>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>> maintain.
> > > >>>>>>>>>>>>>>>>>> I
> > > >>>>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea.
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> So , you are not backup users in this
> > > >>>>> case.
> > > >>>>>>> Many
> > > >>>>>>>>> our
> > > >>>>>>>>>>>>>> customers
> > > >>>>>>>>>>>>>>>>> have
> > > >>>>>>>>>>>>>>>>>>> full
> > > >>>>>>>>>>>>>>>>>>>>>> stack deployed and
> > > >>>>>>>>>>>>>>>>>>>>>> want see backup to be a standard
> > > >>>> feature.
> > > >>>>>>>> Besides
> > > >>>>>>>>>>> this,
> > > >>>>>>>>>>>>>>> nothing
> > > >>>>>>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>>>>>> happen
> > > >>>>>>>>>>>>>>>>>>>>>> in your cluster
> > > >>>>>>>>>>>>>>>>>>>>>> if you won't be doing backups.
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> This discussion (we do not want see M/R
> > > >>>>>>>>> dependency)
> > > >>>>>>>>>>> goes
> > > >>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>> nowhere.
> > > >>>>>>>>>>>>>>>>>>> We
> > > >>>>>>>>>>>>>>>>>>>>>> asked already, at least twice, to
> > > >>>> suggest
> > > >>>>>>>> another
> > > >>>>>>>>>>>>> framework
> > > >>>>>>>>>>>>>>>> (other
> > > >>>>>>>>>>>>>>>>>>> than
> > > >>>>>>>>>>>>>>>>>>>> M/R)
> > > >>>>>>>>>>>>>>>>>>>>>> for bulk data copy with *conversion*.
> > > >>>>> Still
> > > >>>>>>>>> waiting
> > > >>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>> suggestions.
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> -Vlad
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:49 PM, Ted
> > > >>>> Yu <
> > > >>>>>>>>>>>>>> yuzhihong@gmail.com
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> If MR framework is not deployed in the
> > > >>>>>>> cluster,
> > > >>>>>>>>>> hbase
> > > >>>>>>>>>>>>> still
> > > >>>>>>>>>>>>>>>>>> functions
> > > >>>>>>>>>>>>>>>>>>>>>>> normally (post merge).
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> In terms of build time dependency, we
> > > >>>>> have
> > > >>>>>>> long
> > > >>>>>>>>>> been
> > > >>>>>>>>>>>>>>> depending
> > > >>>>>>>>>>>>>>>> on
> > > >>>>>>>>>>>>>>>>>>>>>>> mapreduce. Take a look at
> > > >>>> ExportSnapshot.
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> Cheers
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:42 PM, Heng
> > > >>>>> Chen
> > > >>>>>> <
> > > >>>>>>>>>>>>>>>>>> heng.chen.1986@gmail.com
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> In our production cluster,  it is a
> > > >>>>> common
> > > >>>>>>>> case
> > > >>>>>>>>> we
> > > >>>>>>>>>>>> just
> > > >>>>>>>>>>>>>> have
> > > >>>>>>>>>>>>>>>>> HDFS
> > > >>>>>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>>>>>>> HBase deployed.
> > > >>>>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR
> > > >>>> framework
> > > >>>>>>>>>> (especially
> > > >>>>>>>>>>>> some
> > > >>>>>>>>>>>>>>>>> features
> > > >>>>>>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>>>>>>>>>> have not used at all),  it introduced
> > > >>>>>>> another
> > > >>>>>>>>> cost
> > > >>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>> maintain.
> > > >>>>>>>>>>>>>>>>>> I
> > > >>>>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea.
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:28 GMT+08:00 张铎 <
> > > >>>>>>>>>>> palomino219@gmail.com
> > > >>>>>>>>>>>>> :
> > > >>>>>>>>>>>>>>>>>>>>>>>>> To be specific, for example, our nice
> > > >>>>>>>>>>> Backup/Restore
> > > >>>>>>>>>>>>>>> feature,
> > > >>>>>>>>>>>>>>>>> if
> > > >>>>>>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>>>>>>>>> think
> > > >>>>>>>>>>>>>>>>>>>>>>>>> this is not a core feature of HBase,
> > > >>>>> then
> > > >>>>>>> we
> > > >>>>>>>>>> could
> > > >>>>>>>>>>>> make
> > > >>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>> depend
> > > >>>>>>>>>>>>>>>>>>> on
> > > >>>>>>>>>>>>>>>>>>>>>>> MR,
> > > >>>>>>>>>>>>>>>>>>>>>>>>> and start a standalone BackupManager
> > > >>>>>>> instance
> > > >>>>>>>>>> that
> > > >>>>>>>>>>>>>> submits
> > > >>>>>>>>>>>>>>> MR
> > > >>>>>>>>>>>>>>>>>> jobs
> > > >>>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>>>>> do
> > > >>>>>>>>>>>>>>>>>>>>>>>>> periodical maintenance job. And if we
> > > >>>>>> think
> > > >>>>>>>>> this
> > > >>>>>>>>>>> is a
> > > >>>>>>>>>>>>>> core
> > > >>>>>>>>>>>>>>>>>> feature
> > > >>>>>>>>>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>>>>>>>>>>> everyone should use it, then we'd
> > > >>>>> better
> > > >>>>>>>>>> implement
> > > >>>>>>>>>>> it
> > > >>>>>>>>>>>>>>> without
> > > >>>>>>>>>>>>>>>>> MR
> > > >>>>>>>>>>>>>>>>>>>>>>>>> dependency, like DLS.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Thanks.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:11 GMT+08:00 张铎 <
> > > >>>>>>>>>>> palomino219@gmail.com
> > > >>>>>>>>>>>>> :
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> I‘m -1 on let master or rs launch MR
> > > >>>>>> jobs.
> > > >>>>>>>> It
> > > >>>>>>>>> is
> > > >>>>>>>>>>> OK
> > > >>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>> some
> > > >>>>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>>> our
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> features depend on MR but I think
> > > >>>> the
> > > >>>>>>> bottom
> > > >>>>>>>>>> line
> > > >>>>>>>>>>> is
> > > >>>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>>>>> should
> > > >>>>>>>>>>>>>>>>>>>>>>>> launch
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> the jobs from outside manually or by
> > > >>>>>> other
> > > >>>>>>>>>>> services.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 9:47 GMT+08:00 Andrew
> > > >>>>>> Purtell <
> > > >>>>>>>>>>>>>>>>>>> andrew.purtell@gmail.com
> > > >>>>>>>>>>>>>>>>>>>>> :
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Ok, got it. Well "shelling out" is
> > > >>>> on
> > > >>>>>> the
> > > >>>>>>>>> line
> > > >>>>>>>>>> I
> > > >>>>>>>>>>>>> think,
> > > >>>>>>>>>>>>>>> so
> > > >>>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>> fair
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> question.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Can this be driven by a utility
> > > >>>>> derived
> > > >>>>>>>> from
> > > >>>>>>>>>> Tool
> > > >>>>>>>>>>>>> like
> > > >>>>>>>>>>>>>>> our
> > > >>>>>>>>>>>>>>>>>> other
> > > >>>>>>>>>>>>>>>>>>> MR
> > > >>>>>>>>>>>>>>>>>>>>>>>> apps?
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> The issue is needing the
> > > >>>>>> AccessController
> > > >>>>>>>> to
> > > >>>>>>>>>>> decide
> > > >>>>>>>>>>>>> if
> > > >>>>>>>>>>>>>>>>> allowed?
> > > >>>>>>>>>>>>>>>>>>> But
> > > >>>>>>>>>>>>>>>>>>>>>>>> nothing
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> prevents the user from running the
> > > >>>>> job
> > > >>>>>>>>>>>>>>>>> manually/independently,
> > > >>>>>>>>>>>>>>>>>>>> right?
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 3:44 PM,
> > > >>>> Matteo
> > > >>>>>>>>> Bertozzi <
> > > >>>>>>>>>>>>>>>>>>>>>>>> theo.bertozzi@gmail.com>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> just a remark. my query was not
> > > >>>>> about
> > > >>>>>>>> tools
> > > >>>>>>>>>>> using
> > > >>>>>>>>>>>> MR
> > > >>>>>>>>>>>>>>>>>> (everyone i
> > > >>>>>>>>>>>>>>>>>>>>>>>> think
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> ok with those).
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> the topic was about: "are we ok
> > > >>>> with
> > > >>>>>>>> running
> > > >>>>>>>>>> MR
> > > >>>>>>>>>>>> jobs
> > > >>>>>>>>>>>>>>> from
> > > >>>>>>>>>>>>>>>>>> Master
> > > >>>>>>>>>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>>>>>>> RSs
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> code?" since this will be the
> > > >>>> first
> > > >>>>>> time
> > > >>>>>>>> we
> > > >>>>>>>>> do
> > > >>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Matteo
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM,
> > > >>>>>>> Devaraj
> > > >>>>>>>>> Das
> > > >>>>>>>>>> <
> > > >>>>>>>>>>>>>>>>>>>>>>> ddas@hortonworks.com>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Very much agree; for tools like
> > > >>>>>>>>>> ExportSnapshot
> > > >>>>>>>>>>> /
> > > >>>>>>>>>>>>>>> Backup /
> > > >>>>>>>>>>>>>>>>>>>> Restore,
> > > >>>>>>>>>>>>>>>>>>>>>>>> it's
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> fine to be dependent on MR. MR is
> > > >>>>> the
> > > >>>>>>>> right
> > > >>>>>>>>>>>>> framework
> > > >>>>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>>> such.
> > > >>>>>>>>>>>>>>>>>>>> We
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> should
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> also do compactions using MR
> > > >>>> (just
> > > >>>>>>> saying
> > > >>>>>>>>> :)
> > > >>>>>>>>>> )
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ______________________________
> > > >>>>>>> __________
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> From: Ted Yu <
> > > >>>> yuzhihong@gmail.com>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22,
> > > >>>> 2016
> > > >>>>>> 2:00
> > > >>>>>>>> PM
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs
> > > >>>>>>> started
> > > >>>>>>>>> by
> > > >>>>>>>>>>>> Master
> > > >>>>>>>>>>>>>> or
> > > >>>>>>>>>>>>>>> RS
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree - backup / restore is in
> > > >>>>> the
> > > >>>>>>> same
> > > >>>>>>>>>>>> category
> > > >>>>>>>>>>>>> as
> > > >>>>>>>>>>>>>>>>> import
> > > >>>>>>>>>>>>>>>>>> /
> > > >>>>>>>>>>>>>>>>>>>>>>>> export.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM,
> > > >>>>>> Andrew
> > > >>>>>>>>>>> Purtell <
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> andrew.purtell@gmail.com>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Backup is extra tooling around
> > > >>>>> core
> > > >>>>>> in
> > > >>>>>>>> my
> > > >>>>>>>>>>>> opinion.
> > > >>>>>>>>>>>>>>> Like
> > > >>>>>>>>>>>>>>>>>> import
> > > >>>>>>>>>>>>>>>>>>>> or
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> export.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Or the optional MOB tool. It's
> > > >>>>> fine.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 1:50 PM,
> > > >>>>> Matteo
> > > >>>>>>>>>> Bertozzi
> > > >>>>>>>>>>> <
> > > >>>>>>>>>>>>>>>>>>>>>>>> mbertozzi@apache.org>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What's the latest opinion
> > > >>>> around
> > > >>>>>>>> running
> > > >>>>>>>>> MR
> > > >>>>>>>>>>>> jobs
> > > >>>>>>>>>>>>>> from
> > > >>>>>>>>>>>>>>>>> hbase
> > > >>>>>>>>>>>>>>>>>>>>>>>> (Master
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> or
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> RS)?
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I remember in the past that
> > > >>>> there
> > > >>>>>> was
> > > >>>>>>>>>>>> discussion
> > > >>>>>>>>>>>>>>> about
> > > >>>>>>>>>>>>>>>>> not
> > > >>>>>>>>>>>>>>>>>>>>>>> having
> > > >>>>>>>>>>>>>>>>>>>>>>>> MR
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> has
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> direct dependency of hbase.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think some of discussion
> > > >>>> where
> > > >>>>>>> around
> > > >>>>>>>>> MOB
> > > >>>>>>>>>>>> that
> > > >>>>>>>>>>>>>> had
> > > >>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>> MR
> > > >>>>>>>>>>>>>>>>>> job
> > > >>>>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> compact,
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that later was transformed in a
> > > >>>>>>> non-MR
> > > >>>>>>>>> job
> > > >>>>>>>>>> to
> > > >>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>> merged,
> > > >>>>>>>>>>>>>>>>> I
> > > >>>>>>>>>>>>>>>>>>>> think
> > > >>>>>>>>>>>>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> had a
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> similar discussion for log
> > > >>>>>>>> split/replay.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the latest is the new Backup
> > > >>>>>> feature
> > > >>>>>>>>>>>>> (HBASE-7912),
> > > >>>>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>>>> runs
> > > >>>>>>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>>>>>> MR
> > > >>>>>>>>>>>>>>>>>>>>>>>> job
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the master to copy data or
> > > >>>>> restore
> > > >>>>>>>> data.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (backup is also "not really
> > > >>>> core"
> > > >>>>>> as
> > > >>>>>>>> in..
> > > >>>>>>>>>> if
> > > >>>>>>>>>>>> you
> > > >>>>>>>>>>>>>>> don't
> > > >>>>>>>>>>>>>>>>> use
> > > >>>>>>>>>>>>>>>>>>>>>>> backup
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> you'll
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not end up running MR jobs, but
> > > >>>>>> this
> > > >>>>>>>> was
> > > >>>>>>>>>>>> probably
> > > >>>>>>>>>>>>>>> true
> > > >>>>>>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>>>> MOB
> > > >>>>>>>>>>>>>>>>>>>>>>> as
> > > >>>>>>>>>>>>>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> "if
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you don't enable MOB you don't
> > > >>>>> need
> > > >>>>>>>> MR")
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> any thoughts? do we a rule that
> > > >>>>>> says
> > > >>>>>>>> "we
> > > >>>>>>>>>>> don't
> > > >>>>>>>>>>>>> want
> > > >>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>> have
> > > >>>>>>>>>>>>>>>>>>>>>>> hbase
> > > >>>>>>>>>>>>>>>>>>>>>>>> run
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> MR
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs, only tool started
> > > >>>> manually
> > > >>>>> by
> > > >>>>>>> the
> > > >>>>>>>>>> user
> > > >>>>>>>>>>>> can
> > > >>>>>>>>>>>>> do
> > > >>>>>>>>>>>>>>>>> that".
> > > >>>>>>>>>>>>>>>>>> or
> > > >>>>>>>>>>>>>>>>>>>>>>> can
> > > >>>>>>>>>>>>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> start
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> adding MR calls around without
> > > >>>>>>>> problems?
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message