hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <andrew.purt...@gmail.com>
Subject Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
Date Sat, 24 Sep 2016 18:37:36 GMT
I don't see you prevailing with this line of argument but you are welcome to try. Don't shoot the messenger please. 

On Sep 24, 2016, at 11:08 AM, Vladimir Rodionov <vladrodionov@gmail.com> wrote:

>>> The key takeaway seems to be don't call out to an external framework we
> don't own from master (or regionserver) code.
> Should we ban HDFS as well?
> 
> HBase is a founding partner of a Hadoop stack: HDFS, MapReduce, HBase
> 
> -Vlad
> 
> On Sat, Sep 24, 2016 at 10:40 AM, Andrew Purtell <andrew.purtell@gmail.com>
> wrote:
> 
>> I was attempting to summarize Ted.
>> 
>> A new maven module sounds like a good idea to me. Or we could move all the
>> tools that use MR out to one. Or...
>> 
>> The key takeaway seems to be don't call out to an external framework we
>> don't own from master (or regionserver) code.
>> 
>>> On Sep 24, 2016, at 10:15 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>>> 
>>> bq. Internally the tool can also use the procedure framework for state
>>> durability
>>> 
>>> Isn't this the standalone service I proposed this morning ?
>>> 
>>> bq. Move cross HBase and MR coordination to a separate tool
>>> 
>>> Where should this tool live (hbase-backup module) ?
>>> 
>>> Thanks
>>> 
>>> 
>>> On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell <
>> andrew.purtell@gmail.com>
>>> wrote:
>>> 
>>>> At branch merge voting time now more eyes are getting on the design
>> issues
>>>> with dissenting opinion emerging. This is the branch merge process
>> working
>>>> as our community has designed it. Because this is the first full project
>>>> review of the code and implementation I think we all have to be
>> flexible. I
>>>> see the community as trying to narrow the technical objection at issue
>> to
>>>> the smallest possible scope. It's simple: don't call out to an external
>>>> execution framework we don't own from core master (and by extension
>>>> regionserver) code. We had this objection before to a proposed external
>>>> compaction implementation for
>>>> MOB so should not come as a surprise. Please let me know if I have
>>>> misstated this.
>>>> 
>>>> This would seem to require a modest refactor of coordination to move
>>>> invocation of MR code out from any core code path. To restate what I
>> think
>>>> is an emerging recommendation: Move cross HBase and MR coordination to a
>>>> separate tool. This tool can ask the master to invoke procedures on the
>>>> HBase side that do first mile export and last mile restore. (Internally
>> the
>>>> tool can also use the procedure framework for state durability, perhaps,
>>>> just a thought.) Then the tool can further drive the things done with MR
>>>> like shipping data off cluster or moving remote data in place and
>> preparing
>>>> it for import. These activities do not need procedure coordination and
>>>> involvement of the HBase master. Only the first and last mile of the
>>>> process needs atomicity within the HBase deploy. Please let me know if I
>>>> have misstated this.
>>>> 
>>>> 
>>>>> On Sep 24, 2016, at 8:17 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>>>>> 
>>>>> bq. procedure gives you a retry mechanism on failure
>>>>> 
>>>>> We do need this mechanism. Take a look at the multi-step
>>>>> in FullTableBackupProcedure, etc.
>>>>> 
>>>>> bq. let the user export it later when he wants
>>>>> 
>>>>> This would make supporting security more complex (user A shouldn't be
>>>>> exporting user B's backup). And it is not user friendly - at the time
>>>>> backup request is issued, the following is specified:
>>>>> 
>>>>> +          + " BACKUP_ROOT     The full root path to store the backup
>>>>> image,\n"
>>>>> +          + "                 the prefix can be hdfs, webhdfs or
>> gpfs\n"
>>>>> 
>>>>> Backup root is an integral part of backup manifest.
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> 
>>>>> On Sat, Sep 24, 2016 at 7:59 AM, Matteo Bertozzi <
>>>> theo.bertozzi@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>>> On Sat, Sep 24, 2016 at 7:19 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>>>>>>> 
>>>>>>> Ideally the export should have one job running which does the retry
>> (on
>>>>>>> failed partition) itself.
>>>>>>> 
>>>>>> 
>>>>>> procedure gives you a retry mechanism on failure. if you don't use
>> that,
>>>>>> than you don't need procedure.
>>>>>> if you want you can start a procedure executor in a non master process
>>>> (the
>>>>>> hbase-procedure is a separate package and does not depend on master).
>>>> but
>>>>>> again, export seems a case where you don't need procedure.
>>>>>> 
>>>>>> like snapshot, the logic may just be: ask the master to take a backup.
>>>> and
>>>>>> let the user export it later when he wants. so you avoid having a MR
>> job
>>>>>> started by the master since people does not seems to like it.
>>>>>> 
>>>>>> for restore (I think that is where you use the MR splitter) you can
>>>>>> probably just have a backup ready (already splitted). there is
>> already a
>>>>>> jira that should do that HBASE-14135. instead of doing the operation
>> of
>>>>>> split/merge on restore. you consolidate the backup "offline" (mr job
>>>>>> started by the user) and then ask to restore the backup.
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> On Sat, Sep 24, 2016 at 7:04 AM, Matteo Bertozzi <
>>>>>> theo.bertozzi@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> as far as I understand the code, you don't need procedure for the
>>>>>> export
>>>>>>>> itself.
>>>>>>>> the export operation is already idempotent, since you are just
>> copying
>>>>>>>> files.
>>>>>>>> if the file exist and is complete (check length, checksum, ...) you
>>>> can
>>>>>>>> skip it,
>>>>>>>> otherwise you'll send it over again.
>>>>>>>> 
>>>>>>>> you need the proc for taking the backup and restoring,
>>>>>>>> because you want to complete the operation and end up with a
>>>> consistent
>>>>>>>> state
>>>>>>>> across the multiple components you are updating (meta, fs, ...)
>>>>>>>> but again, for export you can just run the tool over and over until
>>>> the
>>>>>>>> operation succeed, and that should be ok.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Matteo
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Sat, Sep 24, 2016 at 6:54 AM, Ted Yu <yuzhihong@gmail.com>
>> wrote:
>>>>>>>>> 
>>>>>>>>> Master is involved in this discussion because currently only Master
>>>>>>>>> instantiates ProcedureExecutor which runs the 3 Procedures for
>>>>>> backup /
>>>>>>>>> restore.
>>>>>>>>> 
>>>>>>>>> What if an optional standalone service which hosts
>> ProcedureExecutor
>>>>>> is
>>>>>>>>> used for this purpose ?
>>>>>>>>> Would that have better chance of giving us middle ground so that we
>>>>>> can
>>>>>>>>> move this forward ?
>>>>>>>>> 
>>>>>>>>> Cheers
>>>>>>>>> 
>>>>>>>>>> On Fri, Sep 23, 2016 at 5:15 PM, Stack <stack@duboce.net> wrote:
>>>>>>>>>> 
>>>>>>>>>> (Moved out of the Master doing MR DISCUSSION)
>>>>>>>>>> 
>>>>>>>>>> On Fri, Sep 23, 2016 at 12:24 PM, Vladimir Rodionov <
>>>>>>>>>> vladrodionov@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>>>> -1 on that backup be in core hbase
>>>>>>>>>>> 
>>>>>>>>>>> Not sure I understand what it means.
>>>>>>>>>>> 
>>>>>>>>>>> Sorry for the imprecision.
>>>>>>>>>> 
>>>>>>>>>> The -1 is NOT against backup/restore. I am -1 on MR as a
>> dependency
>>>>>>> and
>>>>>>>>> so
>>>>>>>>>> -1 on the Master running backup/restore MR jobs, even if optional.
>>>>>>>>>> 
>>>>>>>>>> Master should not depend on MR. We've gone out of our way to avoid
>>>>>>>> taking
>>>>>>>>>> MR on as dependency in the past. Seems late in the game for us to
>>>>>>>> change
>>>>>>>>>> our opinion on this. If we didn't do it for distributed log
>>>>>>> splitting,
>>>>>>>> or
>>>>>>>>>> MOB, why would we do it to support an optional backup/restore?
>>>>>>>>>> 
>>>>>>>>>> I have opinions on the questions below -- i.e. that Master running
>>>>>>>>>> backup/restore is outside of the Master's charge -- but they are
>>>>>> not
>>>>>>>>> worth
>>>>>>>>>> much since I've not done much by way of review or contrib to
>>>>>>>>> backup/restore
>>>>>>>>>> other than to try it as a 'user' so I'll keep them to myself until
>>>>>> I
>>>>>>>> do.
>>>>>>>>> I
>>>>>>>>>> only came out from under my shell to participate on the MR as
>>>>>>>> dependency
>>>>>>>>>> chat.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> M
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 1. We are not allowed to use Master to orchestrate the whole
>>>>>> process?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> We
>>>>>>>>>>> have already brought up all advantages of using
>>>>>>>>>>> Master and distributed procedures for backup and restore.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Downside of moving this to client tool is lack of fault
>>>>>> tolerance:
>>>>>>>>>>> 1.1 Client won't be allowed to do any operations, that can,
>>>>>>>>> potentially
>>>>>>>>>>> affect
>>>>>>>>>>> cluster, such as disabling splits/merges, balancer.
>>>>>>>>>>> 1.2 In case of client failure who will be doing the whole
>>>>>> rollback
>>>>>>>>>> stuff?
>>>>>>>>>>> We are trying to make it atomic.
>>>>>>>>>>> 
>>>>>>>>>>> Security is not clear.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 2. We are not allowed to modify code of existing HBase core
>> classes
>>>>>>>> (what
>>>>>>>>>>> does core mean anyway)?
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 3. We are not allowed to create backup system table
>>>>>> (hbase:backup)
>>>>>>>> in a
>>>>>>>>>>> system space? Only in user space? The table is global.
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 2. is critical. Despite the fact, that 95% of code is new, we
>>>>>> have
>>>>>>>>>> touched,
>>>>>>>>>>> of course some existing HBase code.
>>>>>>>>>>> 3. is not that critical, of course we can move backup system into
>>>>>>>> user
>>>>>>>>>>> space.
>>>>>>>>>>> 
>>>>>>>>>>> And finally, will moving backup into external tool give us +1
>>>>>> from
>>>>>>>>> stack?
>>>>>>>>>>> 
>>>>>>>>>>> -Vlad
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Sep 23, 2016 at 11:26 AM, Stack <stack@duboce.net>
>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> On Fri, Sep 23, 2016 at 11:22 AM, Vladimir Rodionov <
>>>>>>>>>>>> vladrodionov@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>>>> + MR is dead
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Does MR know that? :)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Again. With all due respect, stack - still no suggestions
>>>>>> what
>>>>>>>>> should
>>>>>>>>>>> we
>>>>>>>>>>>>> use for "bulk data move and transformation" instead of MR?
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Use whatever distributed engine suits your fancy -- MR, Spark,
>>>>>>>>>>> distributed
>>>>>>>>>>>> shell -- just don't have HBase core depend on it, even
>>>>>>> optionally.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> I suggest voting first on "do we need backup in HBase"? In my
>>>>>>>>>> opinion,
>>>>>>>>>>>> some
>>>>>>>>>>>>> group members still not sure about that and some will give -1
>>>>>>>>>>>>> in any case. Just because ...
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> We could run a vote, sure. -1 on that backup be in core hbase
>>>>>> (+1
>>>>>>>> on
>>>>>>>>>>> adding
>>>>>>>>>>>> all the API any such external tool might need to run).
>>>>>>>>>>>> 
>>>>>>>>>>>> St.Ack
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> -Vlad
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 10:57 AM, Stack <stack@duboce.net>
>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 6:46 AM, Matteo Bertozzi <
>>>>>>>>>>>>> theo.bertozzi@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> let me try to go back to my original topic.
>>>>>>>>>>>>>>> this question was meant to be generic, and provide some
>>>>>>> rule
>>>>>>>>> for
>>>>>>>>>>>> future
>>>>>>>>>>>>>>> code.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> from what I can gather, a rule that may satisfy everyone
>>>>>>> can
>>>>>>>>> be:
>>>>>>>>>>>>>>> - we don't want any core feature (e.g.
>>>>>>>>> compaction/log-split/log-
>>>>>>>>>>>>> reply)
>>>>>>>>>>>>>>> over MR, because some cluster may not want or may have an
>>>>>>>>>>>>>>> external/uncontrolled MR setup.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> - we allow non-core features (e.g. features enabled by a
>>>>>>>> flag)
>>>>>>>>>> to
>>>>>>>>>>>> run
>>>>>>>>>>>>> MR
>>>>>>>>>>>>>>> jobs from hbase, because unless you use the feature, MR
>>>>>> is
>>>>>>>> not
>>>>>>>>>>>>> required.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -1 to hbase core depending on MR or core -- whether behind
>>>>>> a
>>>>>>>> flag
>>>>>>>>>> or
>>>>>>>>>>>> not
>>>>>>>>>>>>> --
>>>>>>>>>>>>>> ever being able to launch MR jobs.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> + MR is dead. We should be busy working hard to undo it
>>>>>> from
>>>>>>>>>>>> hbase-server
>>>>>>>>>>>>>> moving it out to be an optional module (Spark would be its
>>>>>>>> peer).
>>>>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and Appy
>>>>>>> are
>>>>>>>>>> busy
>>>>>>>>>>>>>> working hard on moving it up on to a new foundation. Lets
>>>>>> not
>>>>>>>>>> clutter
>>>>>>>>>>>>> task
>>>>>>>>>>>>>> harder by piling on more moving parts.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> St.Ack
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Matteo
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu <
>>>>>>> yuzhihong@gmail.com
>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I suggest you look at Matteo's work for
>>>>>> AssignmentManager
>>>>>>>>> which
>>>>>>>>>>> is
>>>>>>>>>>>> to
>>>>>>>>>>>>>>> make
>>>>>>>>>>>>>>>> Master more stable.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, 张铎 <
>>>>>>> palomino219@gmail.com
>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> No, not your fault, at lease, not this time:)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the
>>>>>>>>> sequence
>>>>>>>>>>> of
>>>>>>>>>>>>>> calls
>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>> starting up the HMaster? HMaster is also a
>>>>>> regionserver
>>>>>>>> so
>>>>>>>>> it
>>>>>>>>>>>>> extends
>>>>>>>>>>>>>>>>> HRegionServer, and the initialization of
>>>>>> HRegionServer
>>>>>>>>>>> sometimes
>>>>>>>>>>>>>> needs
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> make rpc calls to HMaster. A simple change would
>>>>>> cause
>>>>>>>>>>>>> probabilistic
>>>>>>>>>>>>>>> dead
>>>>>>>>>>>>>>>>> lock or some strange NPEs...
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> That's why I'm very nervous when somebody wants to
>>>>>> add
>>>>>>>> new
>>>>>>>>>>>> features
>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>> add
>>>>>>>>>>>>>>>>> external dependencies to HMaster, especially add more
>>>>>>>> works
>>>>>>>>>> for
>>>>>>>>>>>> the
>>>>>>>>>>>>>>> start
>>>>>>>>>>>>>>>>> up processing...
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 2016-09-23 20:02 GMT+08:00 Ted Yu <
>>>>>> yuzhihong@gmail.com
>>>>>>>> :
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> I read through HADOOP-13433
>>>>>>>>>>>>>>>>>> <https://issues.apache.org/
>>>>>> jira/browse/HADOOP-13433>
>>>>>>> -
>>>>>>>>> the
>>>>>>>>>>>> cited
>>>>>>>>>>>>>>> race
>>>>>>>>>>>>>>>>>> condition is in jdk.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Suggest pinging the reviewer on JIRA to get it
>>>>>>> moving.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> bq. But the ugly code in HMaster is readlly a
>>>>>>>> problem...
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Can you be specific as to which code is ugly ? Is
>>>>>> it
>>>>>>> in
>>>>>>>>> the
>>>>>>>>>>>>> backup
>>>>>>>>>>>>>> /
>>>>>>>>>>>>>>>>>> restore mega patch ?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 10:44 PM, 张铎 <
>>>>>>>>>> palomino219@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> If you guys have already implemented the feature
>>>>>> in
>>>>>>>> the
>>>>>>>>>> MR
>>>>>>>>>>>> way
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> patch is ready for landing on master, I'm a -0 on
>>>>>>> it
>>>>>>>>> as I
>>>>>>>>>>> do
>>>>>>>>>>>>> not
>>>>>>>>>>>>>>> want
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> block the development progress.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> But I strongly suggest later we need to revisit
>>>>>> the
>>>>>>>>>> design
>>>>>>>>>>>> and
>>>>>>>>>>>>>> see
>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>> can seperated the logic from HMaster as much as
>>>>>>>>> possible.
>>>>>>>>>>> HA
>>>>>>>>>>>> is
>>>>>>>>>>>>>>> not a
>>>>>>>>>>>>>>>>> big
>>>>>>>>>>>>>>>>>>> problem if you do not store any metada locally.
>>>>>> But
>>>>>>>> the
>>>>>>>>>>> ugly
>>>>>>>>>>>>> code
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>> HMaster is readlly a problem...
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> And for security, I have a issue pending for a
>>>>>> long
>>>>>>>>> time.
>>>>>>>>>>> Can
>>>>>>>>>>>>>>> someone
>>>>>>>>>>>>>>>>>> help
>>>>>>>>>>>>>>>>>>> taking a simple look at it? This is what I mean,
>>>>>>> ugly
>>>>>>>>>>> code...
>>>>>>>>>>>>>>> logout
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> destroy the credentials in a subject when it is
>>>>>>> still
>>>>>>>>>> being
>>>>>>>>>>>>> used,
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> declared as LimitPrivacy so I can not change the
>>>>>>>>> behivor
>>>>>>>>>>> and
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>> way
>>>>>>>>>>>>>>>>>>> to fix it is to write another piece of ugly
>>>>>> code...
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> https://issues.apache.org/
>>>>>> jira/browse/HADOOP-13433
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov <
>>>>>>>>>>>>>>> vladrodionov@gmail.com
>>>>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> If in the future, we find better ways of
>>>>>> doing
>>>>>>>>> this
>>>>>>>>>>>>> without
>>>>>>>>>>>>>>>> using
>>>>>>>>>>>>>>>>>> MR,
>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>> can certainly consider that
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Our framework for distributed operations is
>>>>>>>> abstract
>>>>>>>>>> and
>>>>>>>>>>>>> allows
>>>>>>>>>>>>>>>>>>>> different implementations. MR is just one
>>>>>>>>>> implementation
>>>>>>>>>>> we
>>>>>>>>>>>>>>>> provide.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> -Vlad
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das <
>>>>>>>>>>>>>>> ddas@hortonworks.com
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Guys, first off apologies for bringing in the
>>>>>>>> topic
>>>>>>>>>> of
>>>>>>>>>>>>>> MR-based
>>>>>>>>>>>>>>>>>>>>> compactions.. But I was thinking more about
>>>>>> the
>>>>>>>>>>>>> SpliceMachine
>>>>>>>>>>>>>>>>>> approach
>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>> managing compactions in Spark where
>>>>>> apparently
>>>>>>>> they
>>>>>>>>>>> saw a
>>>>>>>>>>>>> lot
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>> benefits.
>>>>>>>>>>>>>>>>>>>>> Apologies for giving you that sore throat
>>>>>>>> Andrew; I
>>>>>>>>>>>> really
>>>>>>>>>>>>>>> didn't
>>>>>>>>>>>>>>>>>> mean
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> :-)
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> So on this issue, we have these on the plate:
>>>>>>>>>>>>>>>>>>>>> 0. Somehow not use MR but something like that
>>>>>>>>>>>>>>>>>>>>> 1. Run a standalone service other than master
>>>>>>>>>>>>>>>>>>>>> 2. Shell out from the master
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> I don't think we have a good answer to (0),
>>>>>>> and I
>>>>>>>>>> don't
>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>> it's
>>>>>>>>>>>>>>>>>> even
>>>>>>>>>>>>>>>>>>>>> worth the effort of trying to build something
>>>>>>>> when
>>>>>>>>> MR
>>>>>>>>>>> is
>>>>>>>>>>>>>>> already
>>>>>>>>>>>>>>>>>> there,
>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>> being used by HBase already for some
>>>>>>> operations.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> On (1), we have to deal with a myriad of
>>>>>>> issues -
>>>>>>>>> HA
>>>>>>>>>> of
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> server
>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>> being the least of them all. Security
>>>>>> (kerberos
>>>>>>>>>>>>>> authentication,
>>>>>>>>>>>>>>>>>> another
>>>>>>>>>>>>>>>>>>>>> keytab to manage, etc. etc. etc.). IMO, that
>>>>>>>>> approach
>>>>>>>>>>> is
>>>>>>>>>>>>> DOA.
>>>>>>>>>>>>>>>>> Instead
>>>>>>>>>>>>>>>>>>>> let's
>>>>>>>>>>>>>>>>>>>>> substitute that (1) with the HBase Master. I
>>>>>>>>> haven't
>>>>>>>>>>> seen
>>>>>>>>>>>>> any
>>>>>>>>>>>>>>>> good
>>>>>>>>>>>>>>>>>>> reason
>>>>>>>>>>>>>>>>>>>>> why the HBase master shouldn't launch MR jobs
>>>>>>> if
>>>>>>>>>>> needed.
>>>>>>>>>>>>> It's
>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>> ideal;
>>>>>>>>>>>>>>>>>>>>> agreed.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Now before going to (2), let's see what are
>>>>>> the
>>>>>>>>>>> benefits
>>>>>>>>>>>> of
>>>>>>>>>>>>>>>> running
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> backup/restore jobs from the master. I think
>>>>>>> Ted
>>>>>>>>> has
>>>>>>>>>>>>>> summarized
>>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> issues that we need to take care of -
>>>>>>> basically,
>>>>>>>>> the
>>>>>>>>>>>> master
>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>> keep
>>>>>>>>>>>>>>>>>>>> track
>>>>>>>>>>>>>>>>>>>>> of running jobs, and should it fail, the
>>>>>> backup
>>>>>>>>>> master
>>>>>>>>>>>> can
>>>>>>>>>>>>>>>> continue
>>>>>>>>>>>>>>>>>>>> keeping
>>>>>>>>>>>>>>>>>>>>> track of it (since the jobId would have been
>>>>>>>>> recorded
>>>>>>>>>>> in
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> proc
>>>>>>>>>>>>>>>>>> WAL).
>>>>>>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>>>>> master can also do cleanup, etc. of failed
>>>>>>>>>>> backup/restore
>>>>>>>>>>>>>>>>> processes.
>>>>>>>>>>>>>>>>>>>>> Security is another issue - the job needs to
>>>>>>> run
>>>>>>>> as
>>>>>>>>>>>> 'hbase'
>>>>>>>>>>>>>>> since
>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>> owns
>>>>>>>>>>>>>>>>>>>>> the data. Having the master launch the job
>>>>>>> makes
>>>>>>>> it
>>>>>>>>>> get
>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>> privilege.
>>>>>>>>>>>>>>>>>>>> In
>>>>>>>>>>>>>>>>>>>>> the (2) approach, it's hard to do some of the
>>>>>>>> above
>>>>>>>>>>>>>> management.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Guys, just to reiterate, the patch as such is
>>>>>>>> ready
>>>>>>>>>>> from
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> overall
>>>>>>>>>>>>>>>>>>>>> design/arch point of view (maybe code review
>>>>>> is
>>>>>>>>> still
>>>>>>>>>>>>> pending
>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>> Matteo).
>>>>>>>>>>>>>>>>>>>>> If in the future, we find better ways of
>>>>>> doing
>>>>>>>> this
>>>>>>>>>>>> without
>>>>>>>>>>>>>>> using
>>>>>>>>>>>>>>>>> MR,
>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>> can certainly consider that. But IMO don't
>>>>>>> think
>>>>>>>> we
>>>>>>>>>>>> should
>>>>>>>>>>>>>>> block
>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>> patch
>>>>>>>>>>>>>>>>>>>>> from getting merged.
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>>>>>>>>>>> From: 张铎 <palomino219@gmail.com>
>>>>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22, 2016 8:32 PM
>>>>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org
>>>>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs started by
>>>>>>>> Master
>>>>>>>>>> or
>>>>>>>>>>> RS
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> So what about a standalone service other than
>>>>>>>>> master?
>>>>>>>>>>> You
>>>>>>>>>>>>> can
>>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>>> your
>>>>>>>>>>>>>>>>>>>> own
>>>>>>>>>>>>>>>>>>>>> procedure store in that service?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 2016-09-23 11:28 GMT+08:00 Ted Yu <
>>>>>>>>>> yuzhihong@gmail.com
>>>>>>>>>>>> :
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> An earlier implementation was client
>>>>>> driven.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> But with that approach, it is hard to
>>>>>> resume
>>>>>>> if
>>>>>>>>>> there
>>>>>>>>>>>> is
>>>>>>>>>>>>>>> error
>>>>>>>>>>>>>>>>>>> midway.
>>>>>>>>>>>>>>>>>>>>>> Using Procedure V2 makes the backup /
>>>>>> restore
>>>>>>>>> more
>>>>>>>>>>>>> robust.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Another consideration is for security. It
>>>>>> is
>>>>>>>> hard
>>>>>>>>>> to
>>>>>>>>>>>>>> enforce
>>>>>>>>>>>>>>>>>> security
>>>>>>>>>>>>>>>>>>>> (to
>>>>>>>>>>>>>>>>>>>>>> be implemented) for client driven actions.
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 8:15 PM, Andrew
>>>>>>> Purtell <
>>>>>>>>>>>>>>>>>>>> andrew.purtell@gmail.com>
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> No, this misses Matteo's finer point,
>>>>>> which
>>>>>>>> is
>>>>>>>>>>>>> "shelling
>>>>>>>>>>>>>>> out"
>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> master directly to run MR is a first. Why
>>>>>> not
>>>>>>>>> drive
>>>>>>>>>>>> this
>>>>>>>>>>>>>>> with a
>>>>>>>>>>>>>>>>>>> utility
>>>>>>>>>>>>>>>>>>>>>> derived from Tool?
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 7:57 PM, Vladimir
>>>>>>>> Rodionov
>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>> vladrodionov@gmail.com
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> In our production cluster,  it is a
>>>>>>> common
>>>>>>>>>> case
>>>>>>>>>>> we
>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>> HDFS
>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>> HBase deployed.
>>>>>>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR
>>>>>> framework
>>>>>>>>>>>> (especially
>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>> features
>>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>>> have not used at all),  it introduced
>>>>>>>>> another
>>>>>>>>>>> cost
>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>> maintain.
>>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> So , you are not backup users in this
>>>>>>> case.
>>>>>>>>> Many
>>>>>>>>>>> our
>>>>>>>>>>>>>>>> customers
>>>>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>>>> full
>>>>>>>>>>>>>>>>>>>>>>>> stack deployed and
>>>>>>>>>>>>>>>>>>>>>>>> want see backup to be a standard
>>>>>> feature.
>>>>>>>>>> Besides
>>>>>>>>>>>>> this,
>>>>>>>>>>>>>>>>> nothing
>>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>>> happen
>>>>>>>>>>>>>>>>>>>>>>>> in your cluster
>>>>>>>>>>>>>>>>>>>>>>>> if you won't be doing backups.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> This discussion (we do not want see M/R
>>>>>>>>>>> dependency)
>>>>>>>>>>>>> goes
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> nowhere.
>>>>>>>>>>>>>>>>>>>>> We
>>>>>>>>>>>>>>>>>>>>>>>> asked already, at least twice, to
>>>>>> suggest
>>>>>>>>>> another
>>>>>>>>>>>>>>> framework
>>>>>>>>>>>>>>>>>> (other
>>>>>>>>>>>>>>>>>>>>> than
>>>>>>>>>>>>>>>>>>>>>> M/R)
>>>>>>>>>>>>>>>>>>>>>>>> for bulk data copy with *conversion*.
>>>>>>> Still
>>>>>>>>>>> waiting
>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>> suggestions.
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> -Vlad
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:49 PM, Ted
>>>>>> Yu <
>>>>>>>>>>>>>>>> yuzhihong@gmail.com
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> If MR framework is not deployed in the
>>>>>>>>> cluster,
>>>>>>>>>>>> hbase
>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>>>> functions
>>>>>>>>>>>>>>>>>>>>>>>>> normally (post merge).
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> In terms of build time dependency, we
>>>>>>> have
>>>>>>>>> long
>>>>>>>>>>>> been
>>>>>>>>>>>>>>>>> depending
>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>> mapreduce. Take a look at
>>>>>> ExportSnapshot.
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:42 PM, Heng
>>>>>>> Chen
>>>>>>>> <
>>>>>>>>>>>>>>>>>>>> heng.chen.1986@gmail.com
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> In our production cluster,  it is a
>>>>>>> common
>>>>>>>>>> case
>>>>>>>>>>> we
>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>> HDFS
>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>> HBase deployed.
>>>>>>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR
>>>>>> framework
>>>>>>>>>>>> (especially
>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>> features
>>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>>> have not used at all),  it introduced
>>>>>>>>> another
>>>>>>>>>>> cost
>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>> maintain.
>>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea.
>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:28 GMT+08:00 张铎 <
>>>>>>>>>>>>> palomino219@gmail.com
>>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>>>>>>>>>>>>>> To be specific, for example, our nice
>>>>>>>>>>>>> Backup/Restore
>>>>>>>>>>>>>>>>> feature,
>>>>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>>>>>>>>>>> this is not a core feature of HBase,
>>>>>>> then
>>>>>>>>> we
>>>>>>>>>>>> could
>>>>>>>>>>>>>> make
>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>> depend
>>>>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>>>>>>> MR,
>>>>>>>>>>>>>>>>>>>>>>>>>>> and start a standalone BackupManager
>>>>>>>>> instance
>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> submits
>>>>>>>>>>>>>>>>> MR
>>>>>>>>>>>>>>>>>>>> jobs
>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>> do
>>>>>>>>>>>>>>>>>>>>>>>>>>> periodical maintenance job. And if we
>>>>>>>> think
>>>>>>>>>>> this
>>>>>>>>>>>>> is a
>>>>>>>>>>>>>>>> core
>>>>>>>>>>>>>>>>>>>> feature
>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>> everyone should use it, then we'd
>>>>>>> better
>>>>>>>>>>>> implement
>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>> without
>>>>>>>>>>>>>>>>>>> MR
>>>>>>>>>>>>>>>>>>>>>>>>>>> dependency, like DLS.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:11 GMT+08:00 张铎 <
>>>>>>>>>>>>> palomino219@gmail.com
>>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I‘m -1 on let master or rs launch MR
>>>>>>>> jobs.
>>>>>>>>>> It
>>>>>>>>>>> is
>>>>>>>>>>>>> OK
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>> our
>>>>>>>>>>>>>>>>>>>>>>>>>>>> features depend on MR but I think
>>>>>> the
>>>>>>>>> bottom
>>>>>>>>>>>> line
>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>>>>>>>> launch
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the jobs from outside manually or by
>>>>>>>> other
>>>>>>>>>>>>> services.
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 9:47 GMT+08:00 Andrew
>>>>>>>> Purtell <
>>>>>>>>>>>>>>>>>>>>> andrew.purtell@gmail.com
>>>>>>>>>>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ok, got it. Well "shelling out" is
>>>>>> on
>>>>>>>> the
>>>>>>>>>>> line
>>>>>>>>>>>> I
>>>>>>>>>>>>>>> think,
>>>>>>>>>>>>>>>>> so
>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>> fair
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> question.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Can this be driven by a utility
>>>>>>> derived
>>>>>>>>>> from
>>>>>>>>>>>> Tool
>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>> our
>>>>>>>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>>>>>>>> MR
>>>>>>>>>>>>>>>>>>>>>>>>>> apps?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The issue is needing the
>>>>>>>> AccessController
>>>>>>>>>> to
>>>>>>>>>>>>> decide
>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>>>> allowed?
>>>>>>>>>>>>>>>>>>>>> But
>>>>>>>>>>>>>>>>>>>>>>>>>> nothing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> prevents the user from running the
>>>>>>> job
>>>>>>>>>>>>>>>>>>> manually/independently,
>>>>>>>>>>>>>>>>>>>>>> right?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 3:44 PM,
>>>>>> Matteo
>>>>>>>>>>> Bertozzi <
>>>>>>>>>>>>>>>>>>>>>>>>>> theo.bertozzi@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> just a remark. my query was not
>>>>>>> about
>>>>>>>>>> tools
>>>>>>>>>>>>> using
>>>>>>>>>>>>>> MR
>>>>>>>>>>>>>>>>>>>> (everyone i
>>>>>>>>>>>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ok with those).
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the topic was about: "are we ok
>>>>>> with
>>>>>>>>>> running
>>>>>>>>>>>> MR
>>>>>>>>>>>>>> jobs
>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>> Master
>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>> RSs
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> code?" since this will be the
>>>>>> first
>>>>>>>> time
>>>>>>>>>> we
>>>>>>>>>>> do
>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Matteo
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM,
>>>>>>>>> Devaraj
>>>>>>>>>>> Das
>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>> ddas@hortonworks.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Very much agree; for tools like
>>>>>>>>>>>> ExportSnapshot
>>>>>>>>>>>>> /
>>>>>>>>>>>>>>>>> Backup /
>>>>>>>>>>>>>>>>>>>>>> Restore,
>>>>>>>>>>>>>>>>>>>>>>>>>> it's
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fine to be dependent on MR. MR is
>>>>>>> the
>>>>>>>>>> right
>>>>>>>>>>>>>>> framework
>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>> such.
>>>>>>>>>>>>>>>>>>>>>> We
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also do compactions using MR
>>>>>> (just
>>>>>>>>> saying
>>>>>>>>>>> :)
>>>>>>>>>>>> )
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ______________________________
>>>>>>>>> __________
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> From: Ted Yu <
>>>>>> yuzhihong@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22,
>>>>>> 2016
>>>>>>>> 2:00
>>>>>>>>>> PM
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs
>>>>>>>>> started
>>>>>>>>>>> by
>>>>>>>>>>>>>> Master
>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>> RS
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree - backup / restore is in
>>>>>>> the
>>>>>>>>> same
>>>>>>>>>>>>>> category
>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>> import
>>>>>>>>>>>>>>>>>>>> /
>>>>>>>>>>>>>>>>>>>>>>>>>> export.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM,
>>>>>>>> Andrew
>>>>>>>>>>>>> Purtell <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> andrew.purtell@gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Backup is extra tooling around
>>>>>>> core
>>>>>>>> in
>>>>>>>>>> my
>>>>>>>>>>>>>> opinion.
>>>>>>>>>>>>>>>>> Like
>>>>>>>>>>>>>>>>>>>> import
>>>>>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> export.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Or the optional MOB tool. It's
>>>>>>> fine.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 1:50 PM,
>>>>>>> Matteo
>>>>>>>>>>>> Bertozzi
>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>> mbertozzi@apache.org>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What's the latest opinion
>>>>>> around
>>>>>>>>>> running
>>>>>>>>>>> MR
>>>>>>>>>>>>>> jobs
>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>> hbase
>>>>>>>>>>>>>>>>>>>>>>>>>> (Master
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> RS)?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I remember in the past that
>>>>>> there
>>>>>>>> was
>>>>>>>>>>>>>> discussion
>>>>>>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>> having
>>>>>>>>>>>>>>>>>>>>>>>>>> MR
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> direct dependency of hbase.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think some of discussion
>>>>>> where
>>>>>>>>> around
>>>>>>>>>>> MOB
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> had
>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>> MR
>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compact,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that later was transformed in a
>>>>>>>>> non-MR
>>>>>>>>>>> job
>>>>>>>>>>>> to
>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>> merged,
>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> had a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> similar discussion for log
>>>>>>>>>> split/replay.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the latest is the new Backup
>>>>>>>> feature
>>>>>>>>>>>>>>> (HBASE-7912),
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>> runs
>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>>> MR
>>>>>>>>>>>>>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the master to copy data or
>>>>>>> restore
>>>>>>>>>> data.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (backup is also "not really
>>>>>> core"
>>>>>>>> as
>>>>>>>>>> in..
>>>>>>>>>>>> if
>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>>>>>>>>>> backup
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you'll
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not end up running MR jobs, but
>>>>>>>> this
>>>>>>>>>> was
>>>>>>>>>>>>>> probably
>>>>>>>>>>>>>>>>> true
>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>> MOB
>>>>>>>>>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "if
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you don't enable MOB you don't
>>>>>>> need
>>>>>>>>>> MR")
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> any thoughts? do we a rule that
>>>>>>>> says
>>>>>>>>>> "we
>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>> want
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>>>>>>>> hbase
>>>>>>>>>>>>>>>>>>>>>>>>>> run
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> MR
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs, only tool started
>>>>>> manually
>>>>>>> by
>>>>>>>>> the
>>>>>>>>>>>> user
>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>> do
>>>>>>>>>>>>>>>>>>> that".
>>>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> start
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> adding MR calls around without
>>>>>>>>>> problems?
>>>>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>> 

Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message