hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Rodionov <vladrodio...@gmail.com>
Subject Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
Date Sat, 24 Sep 2016 19:21:28 GMT
>> The standalone service so far

1, 2, 3 can be done in client side as well. Are you going to implement HA
for the service? If not, service can fail and will require clean up/repair
on restart. The same can be done with a client - side tool (in repair mode)

 -1 for the separate service. KISS rules. If community want us to remove
MR/Backup from the core we will move it into separate sub-project and
implement this as a client - driven tool set.

-Vlad

On Sat, Sep 24, 2016 at 12:11 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> The standalone service so far seems to be middle ground having the
> following advantages:
>
> 1. utilization of existing proc V2 framework for fault tolerance
> 2. friendliness to security support to be implemented in the next phase -
> security is hard to enforce from client side
> 3. not introducing MR calls in master or region servers
>
> Cheers
>
>
> On Sat, Sep 24, 2016 at 11:26 AM, Vladimir Rodionov <
> vladrodionov@gmail.com>
> wrote:
>
> > >> So the standalone service would run out of proc - in the same vein as
> > REST
> > or thrift server.
> >
> > Ted, running separate process/service to coordinate backups is not a good
> > idea. We have already a lot of them.
> >
> > On Sat, Sep 24, 2016 at 11:20 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> > > bq. don't call out to an external framework we don't own from master
> (or
> > > regionserver) code
> > >
> > > So the standalone service would run out of proc - in the same vein as
> > REST
> > > or thrift server.
> > >
> > > Cheers
> > >
> > > On Sat, Sep 24, 2016 at 10:40 AM, Andrew Purtell <
> > andrew.purtell@gmail.com
> > > >
> > > wrote:
> > >
> > > > I was attempting to summarize Ted.
> > > >
> > > > A new maven module sounds like a good idea to me. Or we could move
> all
> > > the
> > > > tools that use MR out to one. Or...
> > > >
> > > > The key takeaway seems to be don't call out to an external framework
> we
> > > > don't own from master (or regionserver) code.
> > > >
> > > > > On Sep 24, 2016, at 10:15 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> > > > >
> > > > > bq. Internally the tool can also use the procedure framework for
> > state
> > > > > durability
> > > > >
> > > > > Isn't this the standalone service I proposed this morning ?
> > > > >
> > > > > bq. Move cross HBase and MR coordination to a separate tool
> > > > >
> > > > > Where should this tool live (hbase-backup module) ?
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > > On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell <
> > > > andrew.purtell@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> At branch merge voting time now more eyes are getting on the
> design
> > > > issues
> > > > >> with dissenting opinion emerging. This is the branch merge process
> > > > working
> > > > >> as our community has designed it. Because this is the first full
> > > project
> > > > >> review of the code and implementation I think we all have to be
> > > > flexible. I
> > > > >> see the community as trying to narrow the technical objection at
> > issue
> > > > to
> > > > >> the smallest possible scope. It's simple: don't call out to an
> > > external
> > > > >> execution framework we don't own from core master (and by
> extension
> > > > >> regionserver) code. We had this objection before to a proposed
> > > external
> > > > >> compaction implementation for
> > > > >> MOB so should not come as a surprise. Please let me know if I have
> > > > >> misstated this.
> > > > >>
> > > > >> This would seem to require a modest refactor of coordination to
> move
> > > > >> invocation of MR code out from any core code path. To restate
> what I
> > > > think
> > > > >> is an emerging recommendation: Move cross HBase and MR
> coordination
> > > to a
> > > > >> separate tool. This tool can ask the master to invoke procedures
> on
> > > the
> > > > >> HBase side that do first mile export and last mile restore.
> > > (Internally
> > > > the
> > > > >> tool can also use the procedure framework for state durability,
> > > perhaps,
> > > > >> just a thought.) Then the tool can further drive the things done
> > with
> > > MR
> > > > >> like shipping data off cluster or moving remote data in place and
> > > > preparing
> > > > >> it for import. These activities do not need procedure coordination
> > and
> > > > >> involvement of the HBase master. Only the first and last mile of
> the
> > > > >> process needs atomicity within the HBase deploy. Please let me
> know
> > > if I
> > > > >> have misstated this.
> > > > >>
> > > > >>
> > > > >>> On Sep 24, 2016, at 8:17 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> > > > >>>
> > > > >>> bq. procedure gives you a retry mechanism on failure
> > > > >>>
> > > > >>> We do need this mechanism. Take a look at the multi-step
> > > > >>> in FullTableBackupProcedure, etc.
> > > > >>>
> > > > >>> bq. let the user export it later when he wants
> > > > >>>
> > > > >>> This would make supporting security more complex (user A
> shouldn't
> > be
> > > > >>> exporting user B's backup). And it is not user friendly - at the
> > time
> > > > >>> backup request is issued, the following is specified:
> > > > >>>
> > > > >>> +          + " BACKUP_ROOT     The full root path to store the
> > backup
> > > > >>> image,\n"
> > > > >>> +          + "                 the prefix can be hdfs, webhdfs or
> > > > gpfs\n"
> > > > >>>
> > > > >>> Backup root is an integral part of backup manifest.
> > > > >>>
> > > > >>> Cheers
> > > > >>>
> > > > >>>
> > > > >>> On Sat, Sep 24, 2016 at 7:59 AM, Matteo Bertozzi <
> > > > >> theo.bertozzi@gmail.com>
> > > > >>> wrote:
> > > > >>>
> > > > >>>>> On Sat, Sep 24, 2016 at 7:19 AM, Ted Yu <yuzhihong@gmail.com>
> > > wrote:
> > > > >>>>>
> > > > >>>>> Ideally the export should have one job running which does the
> > retry
> > > > (on
> > > > >>>>> failed partition) itself.
> > > > >>>>>
> > > > >>>>
> > > > >>>> procedure gives you a retry mechanism on failure. if you don't
> use
> > > > that,
> > > > >>>> than you don't need procedure.
> > > > >>>> if you want you can start a procedure executor in a non master
> > > process
> > > > >> (the
> > > > >>>> hbase-procedure is a separate package and does not depend on
> > > master).
> > > > >> but
> > > > >>>> again, export seems a case where you don't need procedure.
> > > > >>>>
> > > > >>>> like snapshot, the logic may just be: ask the master to take a
> > > backup.
> > > > >> and
> > > > >>>> let the user export it later when he wants. so you avoid having
> a
> > MR
> > > > job
> > > > >>>> started by the master since people does not seems to like it.
> > > > >>>>
> > > > >>>> for restore (I think that is where you use the MR splitter) you
> > can
> > > > >>>> probably just have a backup ready (already splitted). there is
> > > > already a
> > > > >>>> jira that should do that HBASE-14135. instead of doing the
> > operation
> > > > of
> > > > >>>> split/merge on restore. you consolidate the backup "offline" (mr
> > job
> > > > >>>> started by the user) and then ask to restore the backup.
> > > > >>>>
> > > > >>>>
> > > > >>>>>
> > > > >>>>> On Sat, Sep 24, 2016 at 7:04 AM, Matteo Bertozzi <
> > > > >>>> theo.bertozzi@gmail.com>
> > > > >>>>> wrote:
> > > > >>>>>
> > > > >>>>>> as far as I understand the code, you don't need procedure for
> > the
> > > > >>>> export
> > > > >>>>>> itself.
> > > > >>>>>> the export operation is already idempotent, since you are just
> > > > copying
> > > > >>>>>> files.
> > > > >>>>>> if the file exist and is complete (check length, checksum,
> ...)
> > > you
> > > > >> can
> > > > >>>>>> skip it,
> > > > >>>>>> otherwise you'll send it over again.
> > > > >>>>>>
> > > > >>>>>> you need the proc for taking the backup and restoring,
> > > > >>>>>> because you want to complete the operation and end up with a
> > > > >> consistent
> > > > >>>>>> state
> > > > >>>>>> across the multiple components you are updating (meta, fs,
> ...)
> > > > >>>>>> but again, for export you can just run the tool over and over
> > > until
> > > > >> the
> > > > >>>>>> operation succeed, and that should be ok.
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> Matteo
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>> On Sat, Sep 24, 2016 at 6:54 AM, Ted Yu <yuzhihong@gmail.com
> >
> > > > wrote:
> > > > >>>>>>>
> > > > >>>>>>> Master is involved in this discussion because currently only
> > > Master
> > > > >>>>>>> instantiates ProcedureExecutor which runs the 3 Procedures
> for
> > > > >>>> backup /
> > > > >>>>>>> restore.
> > > > >>>>>>>
> > > > >>>>>>> What if an optional standalone service which hosts
> > > > ProcedureExecutor
> > > > >>>> is
> > > > >>>>>>> used for this purpose ?
> > > > >>>>>>> Would that have better chance of giving us middle ground so
> > that
> > > we
> > > > >>>> can
> > > > >>>>>>> move this forward ?
> > > > >>>>>>>
> > > > >>>>>>> Cheers
> > > > >>>>>>>
> > > > >>>>>>>> On Fri, Sep 23, 2016 at 5:15 PM, Stack <stack@duboce.net>
> > > wrote:
> > > > >>>>>>>>
> > > > >>>>>>>> (Moved out of the Master doing MR DISCUSSION)
> > > > >>>>>>>>
> > > > >>>>>>>> On Fri, Sep 23, 2016 at 12:24 PM, Vladimir Rodionov <
> > > > >>>>>>>> vladrodionov@gmail.com>
> > > > >>>>>>>> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>>>> -1 on that backup be in core hbase
> > > > >>>>>>>>>
> > > > >>>>>>>>> Not sure I understand what it means.
> > > > >>>>>>>>>
> > > > >>>>>>>>> Sorry for the imprecision.
> > > > >>>>>>>>
> > > > >>>>>>>> The -1 is NOT against backup/restore. I am -1 on MR as a
> > > > dependency
> > > > >>>>> and
> > > > >>>>>>> so
> > > > >>>>>>>> -1 on the Master running backup/restore MR jobs, even if
> > > optional.
> > > > >>>>>>>>
> > > > >>>>>>>> Master should not depend on MR. We've gone out of our way to
> > > avoid
> > > > >>>>>> taking
> > > > >>>>>>>> MR on as dependency in the past. Seems late in the game for
> us
> > > to
> > > > >>>>>> change
> > > > >>>>>>>> our opinion on this. If we didn't do it for distributed log
> > > > >>>>> splitting,
> > > > >>>>>> or
> > > > >>>>>>>> MOB, why would we do it to support an optional
> backup/restore?
> > > > >>>>>>>>
> > > > >>>>>>>> I have opinions on the questions below -- i.e. that Master
> > > running
> > > > >>>>>>>> backup/restore is outside of the Master's charge -- but they
> > are
> > > > >>>> not
> > > > >>>>>>> worth
> > > > >>>>>>>> much since I've not done much by way of review or contrib to
> > > > >>>>>>> backup/restore
> > > > >>>>>>>> other than to try it as a 'user' so I'll keep them to myself
> > > until
> > > > >>>> I
> > > > >>>>>> do.
> > > > >>>>>>> I
> > > > >>>>>>>> only came out from under my shell to participate on the MR
> as
> > > > >>>>>> dependency
> > > > >>>>>>>> chat.
> > > > >>>>>>>>
> > > > >>>>>>>> Thanks,
> > > > >>>>>>>> M
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> 1. We are not allowed to use Master to orchestrate the whole
> > > > >>>> process?
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> We
> > > > >>>>>>>>> have already brought up all advantages of using
> > > > >>>>>>>>>  Master and distributed procedures for backup and restore.
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>> Downside of moving this to client tool is lack of fault
> > > > >>>> tolerance:
> > > > >>>>>>>>> 1.1 Client won't be allowed to do any operations, that can,
> > > > >>>>>>> potentially
> > > > >>>>>>>>> affect
> > > > >>>>>>>>> cluster, such as disabling splits/merges, balancer.
> > > > >>>>>>>>> 1.2 In case of client failure who will be doing the whole
> > > > >>>> rollback
> > > > >>>>>>>> stuff?
> > > > >>>>>>>>> We are trying to make it atomic.
> > > > >>>>>>>>>
> > > > >>>>>>>>> Security is not clear.
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> 2. We are not allowed to modify code of existing HBase core
> > > > classes
> > > > >>>>>> (what
> > > > >>>>>>>>> does core mean anyway)?
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>> 3. We are not allowed to create backup system table
> > > > >>>> (hbase:backup)
> > > > >>>>>> in a
> > > > >>>>>>>>> system space? Only in user space? The table is global.
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>> 2. is critical. Despite the fact, that 95% of code is new,
> we
> > > > >>>> have
> > > > >>>>>>>> touched,
> > > > >>>>>>>>> of course some existing HBase code.
> > > > >>>>>>>>> 3. is not that critical, of course we can move backup
> system
> > > into
> > > > >>>>>> user
> > > > >>>>>>>>> space.
> > > > >>>>>>>>>
> > > > >>>>>>>>> And finally, will moving backup into external tool give us
> +1
> > > > >>>> from
> > > > >>>>>>> stack?
> > > > >>>>>>>>>
> > > > >>>>>>>>> -Vlad
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>> On Fri, Sep 23, 2016 at 11:26 AM, Stack <stack@duboce.net>
> > > > >>>> wrote:
> > > > >>>>>>>>>
> > > > >>>>>>>>>> On Fri, Sep 23, 2016 at 11:22 AM, Vladimir Rodionov <
> > > > >>>>>>>>>> vladrodionov@gmail.com>
> > > > >>>>>>>>>> wrote:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>>>> + MR is dead
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Does MR know that? :)
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Again. With all due respect, stack - still no suggestions
> > > > >>>> what
> > > > >>>>>>> should
> > > > >>>>>>>>> we
> > > > >>>>>>>>>>> use for "bulk data move and transformation" instead of
> MR?
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Use whatever distributed engine suits your fancy -- MR,
> > Spark,
> > > > >>>>>>>>> distributed
> > > > >>>>>>>>>> shell -- just don't have HBase core depend on it, even
> > > > >>>>> optionally.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>> I suggest voting first on "do we need backup in HBase"?
> In
> > my
> > > > >>>>>>>> opinion,
> > > > >>>>>>>>>> some
> > > > >>>>>>>>>>> group members still not sure about that and some will
> give
> > -1
> > > > >>>>>>>>>>> in any case. Just because ...
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>> We could run a vote, sure. -1 on that backup be in core
> > hbase
> > > > >>>> (+1
> > > > >>>>>> on
> > > > >>>>>>>>> adding
> > > > >>>>>>>>>> all the API any such external tool might need to run).
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> St.Ack
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>> -Vlad
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> On Fri, Sep 23, 2016 at 10:57 AM, Stack <
> stack@duboce.net>
> > > > >>>>>> wrote:
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>> On Fri, Sep 23, 2016 at 6:46 AM, Matteo Bertozzi <
> > > > >>>>>>>>>>> theo.bertozzi@gmail.com>
> > > > >>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>> let me try to go back to my original topic.
> > > > >>>>>>>>>>>>> this question was meant to be generic, and provide some
> > > > >>>>> rule
> > > > >>>>>>> for
> > > > >>>>>>>>>> future
> > > > >>>>>>>>>>>>> code.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> from what I can gather, a rule that may satisfy
> everyone
> > > > >>>>> can
> > > > >>>>>>> be:
> > > > >>>>>>>>>>>>> - we don't want any core feature (e.g.
> > > > >>>>>>> compaction/log-split/log-
> > > > >>>>>>>>>>> reply)
> > > > >>>>>>>>>>>>> over MR, because some cluster may not want or may have
> an
> > > > >>>>>>>>>>>>> external/uncontrolled MR setup.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> +1
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>> - we allow non-core features (e.g. features enabled by
> a
> > > > >>>>>> flag)
> > > > >>>>>>>> to
> > > > >>>>>>>>>> run
> > > > >>>>>>>>>>> MR
> > > > >>>>>>>>>>>>> jobs from hbase, because unless you use the feature, MR
> > > > >>>> is
> > > > >>>>>> not
> > > > >>>>>>>>>>> required.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>> -1 to hbase core depending on MR or core -- whether
> behind
> > > > >>>> a
> > > > >>>>>> flag
> > > > >>>>>>>> or
> > > > >>>>>>>>>> not
> > > > >>>>>>>>>>> --
> > > > >>>>>>>>>>>> ever being able to launch MR jobs.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> + MR is dead. We should be busy working hard to undo it
> > > > >>>> from
> > > > >>>>>>>>>> hbase-server
> > > > >>>>>>>>>>>> moving it out to be an optional module (Spark would be
> its
> > > > >>>>>> peer).
> > > > >>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and
> > Appy
> > > > >>>>> are
> > > > >>>>>>>> busy
> > > > >>>>>>>>>>>> working hard on moving it up on to a new foundation.
> Lets
> > > > >>>> not
> > > > >>>>>>>> clutter
> > > > >>>>>>>>>>> task
> > > > >>>>>>>>>>>> harder by piling on more moving parts.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> St.Ack
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Matteo
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu <
> > > > >>>>> yuzhihong@gmail.com
> > > > >>>>>>>
> > > > >>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> I suggest you look at Matteo's work for
> > > > >>>> AssignmentManager
> > > > >>>>>>> which
> > > > >>>>>>>>> is
> > > > >>>>>>>>>> to
> > > > >>>>>>>>>>>>> make
> > > > >>>>>>>>>>>>>> Master more stable.
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> Cheers
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 <
> > > > >>>>> palomino219@gmail.com
> > > > >>>>>>>
> > > > >>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> No, not your fault, at lease, not this time:)
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the
> > > > >>>>>>> sequence
> > > > >>>>>>>>> of
> > > > >>>>>>>>>>>> calls
> > > > >>>>>>>>>>>>>> when
> > > > >>>>>>>>>>>>>>> starting up the HMaster? HMaster is also a
> > > > >>>> regionserver
> > > > >>>>>> so
> > > > >>>>>>> it
> > > > >>>>>>>>>>> extends
> > > > >>>>>>>>>>>>>>> HRegionServer, and the initialization of
> > > > >>>> HRegionServer
> > > > >>>>>>>>> sometimes
> > > > >>>>>>>>>>>> needs
> > > > >>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>> make rpc calls to HMaster. A simple change would
> > > > >>>> cause
> > > > >>>>>>>>>>> probabilistic
> > > > >>>>>>>>>>>>> dead
> > > > >>>>>>>>>>>>>>> lock or some strange NPEs...
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> That's why I'm very nervous when somebody wants to
> > > > >>>> add
> > > > >>>>>> new
> > > > >>>>>>>>>> features
> > > > >>>>>>>>>>>> or
> > > > >>>>>>>>>>>>>> add
> > > > >>>>>>>>>>>>>>> external dependencies to HMaster, especially add more
> > > > >>>>>> works
> > > > >>>>>>>> for
> > > > >>>>>>>>>> the
> > > > >>>>>>>>>>>>> start
> > > > >>>>>>>>>>>>>>> up processing...
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> Thanks.
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> 2016-09-23 20:02 GMT+08:00 Ted Yu <
> > > > >>>> yuzhihong@gmail.com
> > > > >>>>>> :
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> I read through HADOOP-13433
> > > > >>>>>>>>>>>>>>>> <https://issues.apache.org/
> > > > >>>> jira/browse/HADOOP-13433>
> > > > >>>>> -
> > > > >>>>>>> the
> > > > >>>>>>>>>> cited
> > > > >>>>>>>>>>>>> race
> > > > >>>>>>>>>>>>>>>> condition is in jdk.
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> Suggest pinging the reviewer on JIRA to get it
> > > > >>>>> moving.
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> bq. But the ugly code in HMaster is readlly a
> > > > >>>>>> problem...
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> Can you be specific as to which code is ugly ? Is
> > > > >>>> it
> > > > >>>>> in
> > > > >>>>>>> the
> > > > >>>>>>>>>>> backup
> > > > >>>>>>>>>>>> /
> > > > >>>>>>>>>>>>>>>> restore mega patch ?
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> Cheers
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 10:44 PM, 张铎 <
> > > > >>>>>>>> palomino219@gmail.com>
> > > > >>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> If you guys have already implemented the feature
> > > > >>>> in
> > > > >>>>>> the
> > > > >>>>>>>> MR
> > > > >>>>>>>>>> way
> > > > >>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>> patch is ready for landing on master, I'm a -0 on
> > > > >>>>> it
> > > > >>>>>>> as I
> > > > >>>>>>>>> do
> > > > >>>>>>>>>>> not
> > > > >>>>>>>>>>>>> want
> > > > >>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>> block the development progress.
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> But I strongly suggest later we need to revisit
> > > > >>>> the
> > > > >>>>>>>> design
> > > > >>>>>>>>>> and
> > > > >>>>>>>>>>>> see
> > > > >>>>>>>>>>>>> if
> > > > >>>>>>>>>>>>>>> we
> > > > >>>>>>>>>>>>>>>>> can seperated the logic from HMaster as much as
> > > > >>>>>>> possible.
> > > > >>>>>>>>> HA
> > > > >>>>>>>>>> is
> > > > >>>>>>>>>>>>> not a
> > > > >>>>>>>>>>>>>>> big
> > > > >>>>>>>>>>>>>>>>> problem if you do not store any metada locally.
> > > > >>>> But
> > > > >>>>>> the
> > > > >>>>>>>>> ugly
> > > > >>>>>>>>>>> code
> > > > >>>>>>>>>>>>> in
> > > > >>>>>>>>>>>>>>>>> HMaster is readlly a problem...
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> And for security, I have a issue pending for a
> > > > >>>> long
> > > > >>>>>>> time.
> > > > >>>>>>>>> Can
> > > > >>>>>>>>>>>>> someone
> > > > >>>>>>>>>>>>>>>> help
> > > > >>>>>>>>>>>>>>>>> taking a simple look at it? This is what I mean,
> > > > >>>>> ugly
> > > > >>>>>>>>> code...
> > > > >>>>>>>>>>>>> logout
> > > > >>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>> destroy the credentials in a subject when it is
> > > > >>>>> still
> > > > >>>>>>>> being
> > > > >>>>>>>>>>> used,
> > > > >>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>> declared as LimitPrivacy so I can not change the
> > > > >>>>>>> behivor
> > > > >>>>>>>>> and
> > > > >>>>>>>>>>> the
> > > > >>>>>>>>>>>>> only
> > > > >>>>>>>>>>>>>>> way
> > > > >>>>>>>>>>>>>>>>> to fix it is to write another piece of ugly
> > > > >>>> code...
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> https://issues.apache.org/
> > > > >>>> jira/browse/HADOOP-13433
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov <
> > > > >>>>>>>>>>>>> vladrodionov@gmail.com
> > > > >>>>>>>>>>>>>>> :
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> If in the future, we find better ways of
> > > > >>>> doing
> > > > >>>>>>> this
> > > > >>>>>>>>>>> without
> > > > >>>>>>>>>>>>>> using
> > > > >>>>>>>>>>>>>>>> MR,
> > > > >>>>>>>>>>>>>>>>> we
> > > > >>>>>>>>>>>>>>>>>> can certainly consider that
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> Our framework for distributed operations is
> > > > >>>>>> abstract
> > > > >>>>>>>> and
> > > > >>>>>>>>>>> allows
> > > > >>>>>>>>>>>>>>>>>> different implementations. MR is just one
> > > > >>>>>>>> implementation
> > > > >>>>>>>>> we
> > > > >>>>>>>>>>>>>> provide.
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> -Vlad
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das <
> > > > >>>>>>>>>>>>> ddas@hortonworks.com
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> Guys, first off apologies for bringing in the
> > > > >>>>>> topic
> > > > >>>>>>>> of
> > > > >>>>>>>>>>>> MR-based
> > > > >>>>>>>>>>>>>>>>>>> compactions.. But I was thinking more about
> > > > >>>> the
> > > > >>>>>>>>>>> SpliceMachine
> > > > >>>>>>>>>>>>>>>> approach
> > > > >>>>>>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>>>>> managing compactions in Spark where
> > > > >>>> apparently
> > > > >>>>>> they
> > > > >>>>>>>>> saw a
> > > > >>>>>>>>>>> lot
> > > > >>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>>>> benefits.
> > > > >>>>>>>>>>>>>>>>>>> Apologies for giving you that sore throat
> > > > >>>>>> Andrew; I
> > > > >>>>>>>>>> really
> > > > >>>>>>>>>>>>> didn't
> > > > >>>>>>>>>>>>>>>> mean
> > > > >>>>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>>>> :-)
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> So on this issue, we have these on the plate:
> > > > >>>>>>>>>>>>>>>>>>> 0. Somehow not use MR but something like that
> > > > >>>>>>>>>>>>>>>>>>> 1. Run a standalone service other than master
> > > > >>>>>>>>>>>>>>>>>>> 2. Shell out from the master
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> I don't think we have a good answer to (0),
> > > > >>>>> and I
> > > > >>>>>>>> don't
> > > > >>>>>>>>>>> think
> > > > >>>>>>>>>>>>>> it's
> > > > >>>>>>>>>>>>>>>> even
> > > > >>>>>>>>>>>>>>>>>>> worth the effort of trying to build something
> > > > >>>>>> when
> > > > >>>>>>> MR
> > > > >>>>>>>>> is
> > > > >>>>>>>>>>>>> already
> > > > >>>>>>>>>>>>>>>> there,
> > > > >>>>>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>>> being used by HBase already for some
> > > > >>>>> operations.
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> On (1), we have to deal with a myriad of
> > > > >>>>> issues -
> > > > >>>>>>> HA
> > > > >>>>>>>> of
> > > > >>>>>>>>>> the
> > > > >>>>>>>>>>>>>> server
> > > > >>>>>>>>>>>>>>>> not
> > > > >>>>>>>>>>>>>>>>>>> being the least of them all. Security
> > > > >>>> (kerberos
> > > > >>>>>>>>>>>> authentication,
> > > > >>>>>>>>>>>>>>>> another
> > > > >>>>>>>>>>>>>>>>>>> keytab to manage, etc. etc. etc.). IMO, that
> > > > >>>>>>> approach
> > > > >>>>>>>>> is
> > > > >>>>>>>>>>> DOA.
> > > > >>>>>>>>>>>>>>> Instead
> > > > >>>>>>>>>>>>>>>>>> let's
> > > > >>>>>>>>>>>>>>>>>>> substitute that (1) with the HBase Master. I
> > > > >>>>>>> haven't
> > > > >>>>>>>>> seen
> > > > >>>>>>>>>>> any
> > > > >>>>>>>>>>>>>> good
> > > > >>>>>>>>>>>>>>>>> reason
> > > > >>>>>>>>>>>>>>>>>>> why the HBase master shouldn't launch MR jobs
> > > > >>>>> if
> > > > >>>>>>>>> needed.
> > > > >>>>>>>>>>> It's
> > > > >>>>>>>>>>>>> not
> > > > >>>>>>>>>>>>>>>>> ideal;
> > > > >>>>>>>>>>>>>>>>>>> agreed.
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> Now before going to (2), let's see what are
> > > > >>>> the
> > > > >>>>>>>>> benefits
> > > > >>>>>>>>>> of
> > > > >>>>>>>>>>>>>> running
> > > > >>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>> backup/restore jobs from the master. I think
> > > > >>>>> Ted
> > > > >>>>>>> has
> > > > >>>>>>>>>>>> summarized
> > > > >>>>>>>>>>>>>>> some
> > > > >>>>>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>> issues that we need to take care of -
> > > > >>>>> basically,
> > > > >>>>>>> the
> > > > >>>>>>>>>> master
> > > > >>>>>>>>>>>> can
> > > > >>>>>>>>>>>>>>> keep
> > > > >>>>>>>>>>>>>>>>>> track
> > > > >>>>>>>>>>>>>>>>>>> of running jobs, and should it fail, the
> > > > >>>> backup
> > > > >>>>>>>> master
> > > > >>>>>>>>>> can
> > > > >>>>>>>>>>>>>> continue
> > > > >>>>>>>>>>>>>>>>>> keeping
> > > > >>>>>>>>>>>>>>>>>>> track of it (since the jobId would have been
> > > > >>>>>>> recorded
> > > > >>>>>>>>> in
> > > > >>>>>>>>>>> the
> > > > >>>>>>>>>>>>> proc
> > > > >>>>>>>>>>>>>>>> WAL).
> > > > >>>>>>>>>>>>>>>>>> The
> > > > >>>>>>>>>>>>>>>>>>> master can also do cleanup, etc. of failed
> > > > >>>>>>>>> backup/restore
> > > > >>>>>>>>>>>>>>> processes.
> > > > >>>>>>>>>>>>>>>>>>> Security is another issue - the job needs to
> > > > >>>>> run
> > > > >>>>>> as
> > > > >>>>>>>>>> 'hbase'
> > > > >>>>>>>>>>>>> since
> > > > >>>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>>> owns
> > > > >>>>>>>>>>>>>>>>>>> the data. Having the master launch the job
> > > > >>>>> makes
> > > > >>>>>> it
> > > > >>>>>>>> get
> > > > >>>>>>>>>>> that
> > > > >>>>>>>>>>>>>>>> privilege.
> > > > >>>>>>>>>>>>>>>>>> In
> > > > >>>>>>>>>>>>>>>>>>> the (2) approach, it's hard to do some of the
> > > > >>>>>> above
> > > > >>>>>>>>>>>> management.
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> Guys, just to reiterate, the patch as such is
> > > > >>>>>> ready
> > > > >>>>>>>>> from
> > > > >>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>> overall
> > > > >>>>>>>>>>>>>>>>>>> design/arch point of view (maybe code review
> > > > >>>> is
> > > > >>>>>>> still
> > > > >>>>>>>>>>> pending
> > > > >>>>>>>>>>>>>> from
> > > > >>>>>>>>>>>>>>>>>> Matteo).
> > > > >>>>>>>>>>>>>>>>>>> If in the future, we find better ways of
> > > > >>>> doing
> > > > >>>>>> this
> > > > >>>>>>>>>> without
> > > > >>>>>>>>>>>>> using
> > > > >>>>>>>>>>>>>>> MR,
> > > > >>>>>>>>>>>>>>>>> we
> > > > >>>>>>>>>>>>>>>>>>> can certainly consider that. But IMO don't
> > > > >>>>> think
> > > > >>>>>> we
> > > > >>>>>>>>>> should
> > > > >>>>>>>>>>>>> block
> > > > >>>>>>>>>>>>>>> this
> > > > >>>>>>>>>>>>>>>>>> patch
> > > > >>>>>>>>>>>>>>>>>>> from getting merged.
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> ________________________________________
> > > > >>>>>>>>>>>>>>>>>>> From: 张铎 <palomino219@gmail.com>
> > > > >>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22, 2016 8:32 PM
> > > > >>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org
> > > > >>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs started by
> > > > >>>>>> Master
> > > > >>>>>>>> or
> > > > >>>>>>>>> RS
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> So what about a standalone service other than
> > > > >>>>>>> master?
> > > > >>>>>>>>> You
> > > > >>>>>>>>>>> can
> > > > >>>>>>>>>>>>> use
> > > > >>>>>>>>>>>>>>>> your
> > > > >>>>>>>>>>>>>>>>>> own
> > > > >>>>>>>>>>>>>>>>>>> procedure store in that service?
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> 2016-09-23 11:28 GMT+08:00 Ted Yu <
> > > > >>>>>>>> yuzhihong@gmail.com
> > > > >>>>>>>>>> :
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> An earlier implementation was client
> > > > >>>> driven.
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> But with that approach, it is hard to
> > > > >>>> resume
> > > > >>>>> if
> > > > >>>>>>>> there
> > > > >>>>>>>>>> is
> > > > >>>>>>>>>>>>> error
> > > > >>>>>>>>>>>>>>>>> midway.
> > > > >>>>>>>>>>>>>>>>>>>> Using Procedure V2 makes the backup /
> > > > >>>> restore
> > > > >>>>>>> more
> > > > >>>>>>>>>>> robust.
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> Another consideration is for security. It
> > > > >>>> is
> > > > >>>>>> hard
> > > > >>>>>>>> to
> > > > >>>>>>>>>>>> enforce
> > > > >>>>>>>>>>>>>>>> security
> > > > >>>>>>>>>>>>>>>>>> (to
> > > > >>>>>>>>>>>>>>>>>>>> be implemented) for client driven actions.
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> Cheers
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 8:15 PM, Andrew
> > > > >>>>> Purtell <
> > > > >>>>>>>>>>>>>>>>>> andrew.purtell@gmail.com>
> > > > >>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> No, this misses Matteo's finer point,
> > > > >>>> which
> > > > >>>>>> is
> > > > >>>>>>>>>>> "shelling
> > > > >>>>>>>>>>>>> out"
> > > > >>>>>>>>>>>>>>>> from
> > > > >>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>> master directly to run MR is a first. Why
> > > > >>>> not
> > > > >>>>>>> drive
> > > > >>>>>>>>>> this
> > > > >>>>>>>>>>>>> with a
> > > > >>>>>>>>>>>>>>>>> utility
> > > > >>>>>>>>>>>>>>>>>>>> derived from Tool?
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 7:57 PM, Vladimir
> > > > >>>>>> Rodionov
> > > > >>>>>>> <
> > > > >>>>>>>>>>>>>>>>>> vladrodionov@gmail.com
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>> In our production cluster,  it is a
> > > > >>>>> common
> > > > >>>>>>>> case
> > > > >>>>>>>>> we
> > > > >>>>>>>>>>>> just
> > > > >>>>>>>>>>>>>> have
> > > > >>>>>>>>>>>>>>>>> HDFS
> > > > >>>>>>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>>>>>>>> HBase deployed.
> > > > >>>>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR
> > > > >>>> framework
> > > > >>>>>>>>>> (especially
> > > > >>>>>>>>>>>> some
> > > > >>>>>>>>>>>>>>>>> features
> > > > >>>>>>>>>>>>>>>>>> we
> > > > >>>>>>>>>>>>>>>>>>>>>>>> have not used at all),  it introduced
> > > > >>>>>>> another
> > > > >>>>>>>>> cost
> > > > >>>>>>>>>>> for
> > > > >>>>>>>>>>>>>>>> maintain.
> > > > >>>>>>>>>>>>>>>>>> I
> > > > >>>>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea.
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> So , you are not backup users in this
> > > > >>>>> case.
> > > > >>>>>>> Many
> > > > >>>>>>>>> our
> > > > >>>>>>>>>>>>>> customers
> > > > >>>>>>>>>>>>>>>>> have
> > > > >>>>>>>>>>>>>>>>>>> full
> > > > >>>>>>>>>>>>>>>>>>>>>> stack deployed and
> > > > >>>>>>>>>>>>>>>>>>>>>> want see backup to be a standard
> > > > >>>> feature.
> > > > >>>>>>>> Besides
> > > > >>>>>>>>>>> this,
> > > > >>>>>>>>>>>>>>> nothing
> > > > >>>>>>>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>>>>>>> happen
> > > > >>>>>>>>>>>>>>>>>>>>>> in your cluster
> > > > >>>>>>>>>>>>>>>>>>>>>> if you won't be doing backups.
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> This discussion (we do not want see M/R
> > > > >>>>>>>>> dependency)
> > > > >>>>>>>>>>> goes
> > > > >>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>> nowhere.
> > > > >>>>>>>>>>>>>>>>>>> We
> > > > >>>>>>>>>>>>>>>>>>>>>> asked already, at least twice, to
> > > > >>>> suggest
> > > > >>>>>>>> another
> > > > >>>>>>>>>>>>> framework
> > > > >>>>>>>>>>>>>>>> (other
> > > > >>>>>>>>>>>>>>>>>>> than
> > > > >>>>>>>>>>>>>>>>>>>> M/R)
> > > > >>>>>>>>>>>>>>>>>>>>>> for bulk data copy with *conversion*.
> > > > >>>>> Still
> > > > >>>>>>>>> waiting
> > > > >>>>>>>>>>> for
> > > > >>>>>>>>>>>>>>>>> suggestions.
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> -Vlad
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:49 PM, Ted
> > > > >>>> Yu <
> > > > >>>>>>>>>>>>>> yuzhihong@gmail.com
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> If MR framework is not deployed in the
> > > > >>>>>>> cluster,
> > > > >>>>>>>>>> hbase
> > > > >>>>>>>>>>>>> still
> > > > >>>>>>>>>>>>>>>>>> functions
> > > > >>>>>>>>>>>>>>>>>>>>>>> normally (post merge).
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> In terms of build time dependency, we
> > > > >>>>> have
> > > > >>>>>>> long
> > > > >>>>>>>>>> been
> > > > >>>>>>>>>>>>>>> depending
> > > > >>>>>>>>>>>>>>>> on
> > > > >>>>>>>>>>>>>>>>>>>>>>> mapreduce. Take a look at
> > > > >>>> ExportSnapshot.
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> Cheers
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:42 PM, Heng
> > > > >>>>> Chen
> > > > >>>>>> <
> > > > >>>>>>>>>>>>>>>>>> heng.chen.1986@gmail.com
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>> In our production cluster,  it is a
> > > > >>>>> common
> > > > >>>>>>>> case
> > > > >>>>>>>>> we
> > > > >>>>>>>>>>>> just
> > > > >>>>>>>>>>>>>> have
> > > > >>>>>>>>>>>>>>>>> HDFS
> > > > >>>>>>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>>>>>>>> HBase deployed.
> > > > >>>>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR
> > > > >>>> framework
> > > > >>>>>>>>>> (especially
> > > > >>>>>>>>>>>> some
> > > > >>>>>>>>>>>>>>>>> features
> > > > >>>>>>>>>>>>>>>>>> we
> > > > >>>>>>>>>>>>>>>>>>>>>>>> have not used at all),  it introduced
> > > > >>>>>>> another
> > > > >>>>>>>>> cost
> > > > >>>>>>>>>>> for
> > > > >>>>>>>>>>>>>>>> maintain.
> > > > >>>>>>>>>>>>>>>>>> I
> > > > >>>>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:28 GMT+08:00 张铎 <
> > > > >>>>>>>>>>> palomino219@gmail.com
> > > > >>>>>>>>>>>>> :
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> To be specific, for example, our nice
> > > > >>>>>>>>>>> Backup/Restore
> > > > >>>>>>>>>>>>>>> feature,
> > > > >>>>>>>>>>>>>>>>> if
> > > > >>>>>>>>>>>>>>>>>> we
> > > > >>>>>>>>>>>>>>>>>>>>>>> think
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> this is not a core feature of HBase,
> > > > >>>>> then
> > > > >>>>>>> we
> > > > >>>>>>>>>> could
> > > > >>>>>>>>>>>> make
> > > > >>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>>> depend
> > > > >>>>>>>>>>>>>>>>>>> on
> > > > >>>>>>>>>>>>>>>>>>>>>>> MR,
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> and start a standalone BackupManager
> > > > >>>>>>> instance
> > > > >>>>>>>>>> that
> > > > >>>>>>>>>>>>>> submits
> > > > >>>>>>>>>>>>>>> MR
> > > > >>>>>>>>>>>>>>>>>> jobs
> > > > >>>>>>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>>>>>>>> do
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> periodical maintenance job. And if we
> > > > >>>>>> think
> > > > >>>>>>>>> this
> > > > >>>>>>>>>>> is a
> > > > >>>>>>>>>>>>>> core
> > > > >>>>>>>>>>>>>>>>>> feature
> > > > >>>>>>>>>>>>>>>>>>>> that
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> everyone should use it, then we'd
> > > > >>>>> better
> > > > >>>>>>>>>> implement
> > > > >>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>> without
> > > > >>>>>>>>>>>>>>>>> MR
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> dependency, like DLS.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> Thanks.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:11 GMT+08:00 张铎 <
> > > > >>>>>>>>>>> palomino219@gmail.com
> > > > >>>>>>>>>>>>> :
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> I‘m -1 on let master or rs launch MR
> > > > >>>>>> jobs.
> > > > >>>>>>>> It
> > > > >>>>>>>>> is
> > > > >>>>>>>>>>> OK
> > > > >>>>>>>>>>>>> that
> > > > >>>>>>>>>>>>>>>> some
> > > > >>>>>>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>>>>> our
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> features depend on MR but I think
> > > > >>>> the
> > > > >>>>>>> bottom
> > > > >>>>>>>>>> line
> > > > >>>>>>>>>>> is
> > > > >>>>>>>>>>>>>> that
> > > > >>>>>>>>>>>>>>> we
> > > > >>>>>>>>>>>>>>>>>>> should
> > > > >>>>>>>>>>>>>>>>>>>>>>>> launch
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> the jobs from outside manually or by
> > > > >>>>>> other
> > > > >>>>>>>>>>> services.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 9:47 GMT+08:00 Andrew
> > > > >>>>>> Purtell <
> > > > >>>>>>>>>>>>>>>>>>> andrew.purtell@gmail.com
> > > > >>>>>>>>>>>>>>>>>>>>> :
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Ok, got it. Well "shelling out" is
> > > > >>>> on
> > > > >>>>>> the
> > > > >>>>>>>>> line
> > > > >>>>>>>>>> I
> > > > >>>>>>>>>>>>> think,
> > > > >>>>>>>>>>>>>>> so
> > > > >>>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>>> fair
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> question.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> Can this be driven by a utility
> > > > >>>>> derived
> > > > >>>>>>>> from
> > > > >>>>>>>>>> Tool
> > > > >>>>>>>>>>>>> like
> > > > >>>>>>>>>>>>>>> our
> > > > >>>>>>>>>>>>>>>>>> other
> > > > >>>>>>>>>>>>>>>>>>> MR
> > > > >>>>>>>>>>>>>>>>>>>>>>>> apps?
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> The issue is needing the
> > > > >>>>>> AccessController
> > > > >>>>>>>> to
> > > > >>>>>>>>>>> decide
> > > > >>>>>>>>>>>>> if
> > > > >>>>>>>>>>>>>>>>> allowed?
> > > > >>>>>>>>>>>>>>>>>>> But
> > > > >>>>>>>>>>>>>>>>>>>>>>>> nothing
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> prevents the user from running the
> > > > >>>>> job
> > > > >>>>>>>>>>>>>>>>> manually/independently,
> > > > >>>>>>>>>>>>>>>>>>>> right?
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 3:44 PM,
> > > > >>>> Matteo
> > > > >>>>>>>>> Bertozzi <
> > > > >>>>>>>>>>>>>>>>>>>>>>>> theo.bertozzi@gmail.com>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> just a remark. my query was not
> > > > >>>>> about
> > > > >>>>>>>> tools
> > > > >>>>>>>>>>> using
> > > > >>>>>>>>>>>> MR
> > > > >>>>>>>>>>>>>>>>>> (everyone i
> > > > >>>>>>>>>>>>>>>>>>>>>>>> think
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> is
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> ok with those).
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> the topic was about: "are we ok
> > > > >>>> with
> > > > >>>>>>>> running
> > > > >>>>>>>>>> MR
> > > > >>>>>>>>>>>> jobs
> > > > >>>>>>>>>>>>>>> from
> > > > >>>>>>>>>>>>>>>>>> Master
> > > > >>>>>>>>>>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>>>>>>>> RSs
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> code?" since this will be the
> > > > >>>> first
> > > > >>>>>> time
> > > > >>>>>>>> we
> > > > >>>>>>>>> do
> > > > >>>>>>>>>>>> this
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Matteo
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM,
> > > > >>>>>>> Devaraj
> > > > >>>>>>>>> Das
> > > > >>>>>>>>>> <
> > > > >>>>>>>>>>>>>>>>>>>>>>> ddas@hortonworks.com>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Very much agree; for tools like
> > > > >>>>>>>>>> ExportSnapshot
> > > > >>>>>>>>>>> /
> > > > >>>>>>>>>>>>>>> Backup /
> > > > >>>>>>>>>>>>>>>>>>>> Restore,
> > > > >>>>>>>>>>>>>>>>>>>>>>>> it's
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> fine to be dependent on MR. MR is
> > > > >>>>> the
> > > > >>>>>>>> right
> > > > >>>>>>>>>>>>> framework
> > > > >>>>>>>>>>>>>>> for
> > > > >>>>>>>>>>>>>>>>>> such.
> > > > >>>>>>>>>>>>>>>>>>>> We
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> should
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> also do compactions using MR
> > > > >>>> (just
> > > > >>>>>>> saying
> > > > >>>>>>>>> :)
> > > > >>>>>>>>>> )
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ______________________________
> > > > >>>>>>> __________
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> From: Ted Yu <
> > > > >>>> yuzhihong@gmail.com>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22,
> > > > >>>> 2016
> > > > >>>>>> 2:00
> > > > >>>>>>>> PM
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs
> > > > >>>>>>> started
> > > > >>>>>>>>> by
> > > > >>>>>>>>>>>> Master
> > > > >>>>>>>>>>>>>> or
> > > > >>>>>>>>>>>>>>> RS
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree - backup / restore is in
> > > > >>>>> the
> > > > >>>>>>> same
> > > > >>>>>>>>>>>> category
> > > > >>>>>>>>>>>>> as
> > > > >>>>>>>>>>>>>>>>> import
> > > > >>>>>>>>>>>>>>>>>> /
> > > > >>>>>>>>>>>>>>>>>>>>>>>> export.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM,
> > > > >>>>>> Andrew
> > > > >>>>>>>>>>> Purtell <
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> andrew.purtell@gmail.com>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Backup is extra tooling around
> > > > >>>>> core
> > > > >>>>>> in
> > > > >>>>>>>> my
> > > > >>>>>>>>>>>> opinion.
> > > > >>>>>>>>>>>>>>> Like
> > > > >>>>>>>>>>>>>>>>>> import
> > > > >>>>>>>>>>>>>>>>>>>> or
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> export.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Or the optional MOB tool. It's
> > > > >>>>> fine.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 1:50 PM,
> > > > >>>>> Matteo
> > > > >>>>>>>>>> Bertozzi
> > > > >>>>>>>>>>> <
> > > > >>>>>>>>>>>>>>>>>>>>>>>> mbertozzi@apache.org>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What's the latest opinion
> > > > >>>> around
> > > > >>>>>>>> running
> > > > >>>>>>>>> MR
> > > > >>>>>>>>>>>> jobs
> > > > >>>>>>>>>>>>>> from
> > > > >>>>>>>>>>>>>>>>> hbase
> > > > >>>>>>>>>>>>>>>>>>>>>>>> (Master
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> or
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> RS)?
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I remember in the past that
> > > > >>>> there
> > > > >>>>>> was
> > > > >>>>>>>>>>>> discussion
> > > > >>>>>>>>>>>>>>> about
> > > > >>>>>>>>>>>>>>>>> not
> > > > >>>>>>>>>>>>>>>>>>>>>>> having
> > > > >>>>>>>>>>>>>>>>>>>>>>>> MR
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> has
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> direct dependency of hbase.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think some of discussion
> > > > >>>> where
> > > > >>>>>>> around
> > > > >>>>>>>>> MOB
> > > > >>>>>>>>>>>> that
> > > > >>>>>>>>>>>>>> had
> > > > >>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>> MR
> > > > >>>>>>>>>>>>>>>>>> job
> > > > >>>>>>>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> compact,
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that later was transformed in a
> > > > >>>>>>> non-MR
> > > > >>>>>>>>> job
> > > > >>>>>>>>>> to
> > > > >>>>>>>>>>>> be
> > > > >>>>>>>>>>>>>>>> merged,
> > > > >>>>>>>>>>>>>>>>> I
> > > > >>>>>>>>>>>>>>>>>>>> think
> > > > >>>>>>>>>>>>>>>>>>>>>>>> we
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> had a
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> similar discussion for log
> > > > >>>>>>>> split/replay.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the latest is the new Backup
> > > > >>>>>> feature
> > > > >>>>>>>>>>>>> (HBASE-7912),
> > > > >>>>>>>>>>>>>>> that
> > > > >>>>>>>>>>>>>>>>>> runs
> > > > >>>>>>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>>>>>>>> MR
> > > > >>>>>>>>>>>>>>>>>>>>>>>> job
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the master to copy data or
> > > > >>>>> restore
> > > > >>>>>>>> data.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (backup is also "not really
> > > > >>>> core"
> > > > >>>>>> as
> > > > >>>>>>>> in..
> > > > >>>>>>>>>> if
> > > > >>>>>>>>>>>> you
> > > > >>>>>>>>>>>>>>> don't
> > > > >>>>>>>>>>>>>>>>> use
> > > > >>>>>>>>>>>>>>>>>>>>>>> backup
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> you'll
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not end up running MR jobs, but
> > > > >>>>>> this
> > > > >>>>>>>> was
> > > > >>>>>>>>>>>> probably
> > > > >>>>>>>>>>>>>>> true
> > > > >>>>>>>>>>>>>>>>> for
> > > > >>>>>>>>>>>>>>>>>>> MOB
> > > > >>>>>>>>>>>>>>>>>>>>>>> as
> > > > >>>>>>>>>>>>>>>>>>>>>>>> in
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> "if
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you don't enable MOB you don't
> > > > >>>>> need
> > > > >>>>>>>> MR")
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> any thoughts? do we a rule that
> > > > >>>>>> says
> > > > >>>>>>>> "we
> > > > >>>>>>>>>>> don't
> > > > >>>>>>>>>>>>> want
> > > > >>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>> have
> > > > >>>>>>>>>>>>>>>>>>>>>>> hbase
> > > > >>>>>>>>>>>>>>>>>>>>>>>> run
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> MR
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs, only tool started
> > > > >>>> manually
> > > > >>>>> by
> > > > >>>>>>> the
> > > > >>>>>>>>>> user
> > > > >>>>>>>>>>>> can
> > > > >>>>>>>>>>>>> do
> > > > >>>>>>>>>>>>>>>>> that".
> > > > >>>>>>>>>>>>>>>>>> or
> > > > >>>>>>>>>>>>>>>>>>>>>>> can
> > > > >>>>>>>>>>>>>>>>>>>>>>>> we
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> start
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> adding MR calls around without
> > > > >>>>>>>> problems?
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message