hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matteo Bertozzi <theo.berto...@gmail.com>
Subject Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS)
Date Sat, 24 Sep 2016 14:59:46 GMT
On Sat, Sep 24, 2016 at 7:19 AM, Ted Yu <yuzhihong@gmail.com> wrote:

> Ideally the export should have one job running which does the retry (on
> failed partition) itself.
>

procedure gives you a retry mechanism on failure. if you don't use that,
than you don't need procedure.
if you want you can start a procedure executor in a non master process (the
hbase-procedure is a separate package and does not depend on master). but
again, export seems a case where you don't need procedure.

like snapshot, the logic may just be: ask the master to take a backup. and
let the user export it later when he wants. so you avoid having a MR job
started by the master since people does not seems to like it.

for restore (I think that is where you use the MR splitter) you can
probably just have a backup ready (already splitted). there is already a
jira that should do that HBASE-14135. instead of doing the operation of
split/merge on restore. you consolidate the backup "offline" (mr job
started by the user) and then ask to restore the backup.


>
> On Sat, Sep 24, 2016 at 7:04 AM, Matteo Bertozzi <theo.bertozzi@gmail.com>
> wrote:
>
> > as far as I understand the code, you don't need procedure for the export
> > itself.
> > the export operation is already idempotent, since you are just copying
> > files.
> > if the file exist and is complete (check length, checksum, ...) you can
> > skip it,
> > otherwise you'll send it over again.
> >
> > you need the proc for taking the backup and restoring,
> > because you want to complete the operation and end up with a consistent
> > state
> > across the multiple components you are updating (meta, fs, ...)
> > but again, for export you can just run the tool over and over until the
> > operation succeed, and that should be ok.
> >
> >
> >
> > Matteo
> >
> >
> > On Sat, Sep 24, 2016 at 6:54 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> > > Master is involved in this discussion because currently only Master
> > > instantiates ProcedureExecutor which runs the 3 Procedures for backup /
> > > restore.
> > >
> > > What if an optional standalone service which hosts ProcedureExecutor is
> > > used for this purpose ?
> > > Would that have better chance of giving us middle ground so that we can
> > > move this forward ?
> > >
> > > Cheers
> > >
> > > On Fri, Sep 23, 2016 at 5:15 PM, Stack <stack@duboce.net> wrote:
> > >
> > > > (Moved out of the Master doing MR DISCUSSION)
> > > >
> > > > On Fri, Sep 23, 2016 at 12:24 PM, Vladimir Rodionov <
> > > > vladrodionov@gmail.com>
> > > > wrote:
> > > >
> > > > > >>  -1 on that backup be in core hbase
> > > > >
> > > > > Not sure I understand what it means.
> > > > >
> > > > > Sorry for the imprecision.
> > > >
> > > > The -1 is NOT against backup/restore. I am -1 on MR as a dependency
> and
> > > so
> > > > -1 on the Master running backup/restore MR jobs, even if optional.
> > > >
> > > > Master should not depend on MR. We've gone out of our way to avoid
> > taking
> > > > MR on as dependency in the past. Seems late in the game for us to
> > change
> > > > our opinion on this. If we didn't do it for distributed log
> splitting,
> > or
> > > > MOB, why would we do it to support an optional backup/restore?
> > > >
> > > > I have opinions on the questions below -- i.e. that Master running
> > > > backup/restore is outside of the Master's charge -- but they are not
> > > worth
> > > > much since I've not done much by way of review or contrib to
> > > backup/restore
> > > > other than to try it as a 'user' so I'll keep them to myself until I
> > do.
> > > I
> > > > only came out from under my shell to participate on the MR as
> > dependency
> > > > chat.
> > > >
> > > > Thanks,
> > > > M
> > > >
> > > >
> > > > 1. We are not allowed to use Master to orchestrate the whole process?
> > > >
> > > >
> > > > We
> > > > > have already brought up all advantages of using
> > > > >    Master and distributed procedures for backup and restore.
> > > > >
> > > > >
> > > > > Downside of moving this to client tool is lack of fault tolerance:
> > > > >  1.1 Client won't be allowed to do any operations, that can,
> > > potentially
> > > > > affect
> > > > > cluster, such as disabling splits/merges, balancer.
> > > > >  1.2 In case of client failure who will be doing the whole rollback
> > > > stuff?
> > > > > We are trying to make it atomic.
> > > > >
> > > > > Security is not clear.
> > > >
> > > >
> > > >
> > > > 2. We are not allowed to modify code of existing HBase core classes
> > (what
> > > > > does core mean anyway)?
> > > > >
> > > > >
> > > >
> > > >
> > > > > 3. We are not allowed to create backup system table (hbase:backup)
> > in a
> > > > > system space? Only in user space? The table is global.
> > > > >
> > > >
> > > >
> > > > > 2. is critical. Despite the fact, that 95% of code is new, we have
> > > > touched,
> > > > > of course some existing HBase code.
> > > > > 3. is not that critical, of course we can move backup system into
> > user
> > > > > space.
> > > > >
> > > > > And finally, will moving backup into external tool give us +1 from
> > > stack?
> > > > >
> > > > > -Vlad
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Sep 23, 2016 at 11:26 AM, Stack <stack@duboce.net> wrote:
> > > > >
> > > > > > On Fri, Sep 23, 2016 at 11:22 AM, Vladimir Rodionov <
> > > > > > vladrodionov@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > >> + MR is dead
> > > > > > >
> > > > > > > Does MR know that? :)
> > > > > > >
> > > > > > > Again. With all due respect, stack - still no suggestions what
> > > should
> > > > > we
> > > > > > > use for "bulk data move and transformation" instead of MR?
> > > > > > >
> > > > > >
> > > > > > Use whatever distributed engine suits your fancy -- MR, Spark,
> > > > > distributed
> > > > > > shell -- just don't have HBase core depend on it, even
> optionally.
> > > > > >
> > > > > >
> > > > > > > I suggest voting first on "do we need backup in HBase"? In my
> > > > opinion,
> > > > > > some
> > > > > > > group members still not sure about that and some will give -1
> > > > > > > in any case. Just because ...
> > > > > > >
> > > > > > >
> > > > > > We could run a vote, sure. -1 on that backup be in core hbase (+1
> > on
> > > > > adding
> > > > > > all the API any such external tool might need to run).
> > > > > >
> > > > > > St.Ack
> > > > > >
> > > > > >
> > > > > >
> > > > > > > -Vlad
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Sep 23, 2016 at 10:57 AM, Stack <stack@duboce.net>
> > wrote:
> > > > > > >
> > > > > > > > On Fri, Sep 23, 2016 at 6:46 AM, Matteo Bertozzi <
> > > > > > > theo.bertozzi@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > let me try to go back to my original topic.
> > > > > > > > > this question was meant to be generic, and provide some
> rule
> > > for
> > > > > > future
> > > > > > > > > code.
> > > > > > > > >
> > > > > > > > > from what I can gather, a rule that may satisfy everyone
> can
> > > be:
> > > > > > > > >  - we don't want any core feature (e.g.
> > > compaction/log-split/log-
> > > > > > > reply)
> > > > > > > > > over MR, because some cluster may not want or may have an
> > > > > > > > > external/uncontrolled MR setup.
> > > > > > > > >
> > > > > > > >
> > > > > > > > +1
> > > > > > > >
> > > > > > > >
> > > > > > > > >  - we allow non-core features (e.g. features enabled by a
> > flag)
> > > > to
> > > > > > run
> > > > > > > MR
> > > > > > > > > jobs from hbase, because unless you use the feature, MR is
> > not
> > > > > > > required.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > -1 to hbase core depending on MR or core -- whether behind a
> > flag
> > > > or
> > > > > > not
> > > > > > > --
> > > > > > > > ever being able to launch MR jobs.
> > > > > > > >
> > > > > > > > + MR is dead. We should be busy working hard to undo it from
> > > > > > hbase-server
> > > > > > > > moving it out to be an optional module (Spark would be its
> > peer).
> > > > > > > > + Master is a rats nest of state. Matteo, Stephen, and Appy
> are
> > > > busy
> > > > > > > > working hard on moving it up on to a new foundation. Lets not
> > > > clutter
> > > > > > > task
> > > > > > > > harder by piling on more moving parts.
> > > > > > > >
> > > > > > > > St.Ack
> > > > > > > >
> > > > > > > >
> > > > > > > > > Matteo
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu <
> yuzhihong@gmail.com
> > >
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I suggest you look at Matteo's work for AssignmentManager
> > > which
> > > > > is
> > > > > > to
> > > > > > > > > make
> > > > > > > > > > Master more stable.
> > > > > > > > > >
> > > > > > > > > > Cheers
> > > > > > > > > >
> > > > > > > > > > On Fri, Sep 23, 2016 at 5:32 AM, 张铎 <
> palomino219@gmail.com
> > >
> > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > No, not your fault, at lease, not this time:)
> > > > > > > > > > >
> > > > > > > > > > > Why I call the code ugly? Can you simply tell me the
> > > sequence
> > > > > of
> > > > > > > > calls
> > > > > > > > > > when
> > > > > > > > > > > starting up the HMaster? HMaster is also a regionserver
> > so
> > > it
> > > > > > > extends
> > > > > > > > > > > HRegionServer, and the initialization of HRegionServer
> > > > > sometimes
> > > > > > > > needs
> > > > > > > > > to
> > > > > > > > > > > make rpc calls to HMaster. A simple change would cause
> > > > > > > probabilistic
> > > > > > > > > dead
> > > > > > > > > > > lock or some strange NPEs...
> > > > > > > > > > >
> > > > > > > > > > > That's why I'm very nervous when somebody wants to add
> > new
> > > > > > features
> > > > > > > > or
> > > > > > > > > > add
> > > > > > > > > > > external dependencies to HMaster, especially add more
> > works
> > > > for
> > > > > > the
> > > > > > > > > start
> > > > > > > > > > > up processing...
> > > > > > > > > > >
> > > > > > > > > > > Thanks.
> > > > > > > > > > >
> > > > > > > > > > > 2016-09-23 20:02 GMT+08:00 Ted Yu <yuzhihong@gmail.com
> >:
> > > > > > > > > > >
> > > > > > > > > > > > I read through HADOOP-13433
> > > > > > > > > > > > <https://issues.apache.org/jira/browse/HADOOP-13433>
> -
> > > the
> > > > > > cited
> > > > > > > > > race
> > > > > > > > > > > > condition is in jdk.
> > > > > > > > > > > >
> > > > > > > > > > > > Suggest pinging the reviewer on JIRA to get it
> moving.
> > > > > > > > > > > >
> > > > > > > > > > > > bq. But the ugly code in HMaster is readlly a
> > problem...
> > > > > > > > > > > >
> > > > > > > > > > > > Can you be specific as to which code is ugly ? Is it
> in
> > > the
> > > > > > > backup
> > > > > > > > /
> > > > > > > > > > > > restore mega patch ?
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Sep 22, 2016 at 10:44 PM, 张铎 <
> > > > palomino219@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > If you guys have already implemented the feature in
> > the
> > > > MR
> > > > > > way
> > > > > > > > and
> > > > > > > > > > the
> > > > > > > > > > > > > patch is ready for landing on master, I'm a -0 on
> it
> > > as I
> > > > > do
> > > > > > > not
> > > > > > > > > want
> > > > > > > > > > > to
> > > > > > > > > > > > > block the development progress.
> > > > > > > > > > > > >
> > > > > > > > > > > > > But I strongly suggest later we need to revisit the
> > > > design
> > > > > > and
> > > > > > > > see
> > > > > > > > > if
> > > > > > > > > > > we
> > > > > > > > > > > > > can seperated the logic from HMaster as much as
> > > possible.
> > > > > HA
> > > > > > is
> > > > > > > > > not a
> > > > > > > > > > > big
> > > > > > > > > > > > > problem if you do not store any metada locally. But
> > the
> > > > > ugly
> > > > > > > code
> > > > > > > > > in
> > > > > > > > > > > > > HMaster is readlly a problem...
> > > > > > > > > > > > >
> > > > > > > > > > > > > And for security, I have a issue pending for a long
> > > time.
> > > > > Can
> > > > > > > > > someone
> > > > > > > > > > > > help
> > > > > > > > > > > > > taking a simple look at it? This is what I mean,
> ugly
> > > > > code...
> > > > > > > > > logout
> > > > > > > > > > > and
> > > > > > > > > > > > > destroy the credentials in a subject when it is
> still
> > > > being
> > > > > > > used,
> > > > > > > > > and
> > > > > > > > > > > > > declared as LimitPrivacy so I can not change the
> > > behivor
> > > > > and
> > > > > > > the
> > > > > > > > > only
> > > > > > > > > > > way
> > > > > > > > > > > > > to fix it is to write another piece of ugly code...
> > > > > > > > > > > > >
> > > > > > > > > > > > > https://issues.apache.org/jira/browse/HADOOP-13433
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov <
> > > > > > > > > vladrodionov@gmail.com
> > > > > > > > > > >:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > >> If in the future, we find better ways of doing
> > > this
> > > > > > > without
> > > > > > > > > > using
> > > > > > > > > > > > MR,
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > can certainly consider that
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Our framework for distributed operations is
> > abstract
> > > > and
> > > > > > > allows
> > > > > > > > > > > > > > different implementations. MR is just one
> > > > implementation
> > > > > we
> > > > > > > > > > provide.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > -Vlad
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das <
> > > > > > > > > ddas@hortonworks.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Guys, first off apologies for bringing in the
> > topic
> > > > of
> > > > > > > > MR-based
> > > > > > > > > > > > > > > compactions.. But I was thinking more about the
> > > > > > > SpliceMachine
> > > > > > > > > > > > approach
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > managing compactions in Spark where apparently
> > they
> > > > > saw a
> > > > > > > lot
> > > > > > > > > of
> > > > > > > > > > > > > > benefits.
> > > > > > > > > > > > > > > Apologies for giving you that sore throat
> > Andrew; I
> > > > > > really
> > > > > > > > > didn't
> > > > > > > > > > > > mean
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > :-)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > So on this issue, we have these on the plate:
> > > > > > > > > > > > > > > 0. Somehow not use MR but something like that
> > > > > > > > > > > > > > > 1. Run a standalone service other than master
> > > > > > > > > > > > > > > 2. Shell out from the master
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I don't think we have a good answer to (0),
> and I
> > > > don't
> > > > > > > think
> > > > > > > > > > it's
> > > > > > > > > > > > even
> > > > > > > > > > > > > > > worth the effort of trying to build something
> > when
> > > MR
> > > > > is
> > > > > > > > > already
> > > > > > > > > > > > there,
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > being used by HBase already for some
> operations.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On (1), we have to deal with a myriad of
> issues -
> > > HA
> > > > of
> > > > > > the
> > > > > > > > > > server
> > > > > > > > > > > > not
> > > > > > > > > > > > > > > being the least of them all. Security (kerberos
> > > > > > > > authentication,
> > > > > > > > > > > > another
> > > > > > > > > > > > > > > keytab to manage, etc. etc. etc.). IMO, that
> > > approach
> > > > > is
> > > > > > > DOA.
> > > > > > > > > > > Instead
> > > > > > > > > > > > > > let's
> > > > > > > > > > > > > > > substitute that (1) with the HBase Master. I
> > > haven't
> > > > > seen
> > > > > > > any
> > > > > > > > > > good
> > > > > > > > > > > > > reason
> > > > > > > > > > > > > > > why the HBase master shouldn't launch MR jobs
> if
> > > > > needed.
> > > > > > > It's
> > > > > > > > > not
> > > > > > > > > > > > > ideal;
> > > > > > > > > > > > > > > agreed.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Now before going to (2), let's see what are the
> > > > > benefits
> > > > > > of
> > > > > > > > > > running
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > backup/restore jobs from the master. I think
> Ted
> > > has
> > > > > > > > summarized
> > > > > > > > > > > some
> > > > > > > > > > > > of
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > issues that we need to take care of -
> basically,
> > > the
> > > > > > master
> > > > > > > > can
> > > > > > > > > > > keep
> > > > > > > > > > > > > > track
> > > > > > > > > > > > > > > of running jobs, and should it fail, the backup
> > > > master
> > > > > > can
> > > > > > > > > > continue
> > > > > > > > > > > > > > keeping
> > > > > > > > > > > > > > > track of it (since the jobId would have been
> > > recorded
> > > > > in
> > > > > > > the
> > > > > > > > > proc
> > > > > > > > > > > > WAL).
> > > > > > > > > > > > > > The
> > > > > > > > > > > > > > > master can also do cleanup, etc. of failed
> > > > > backup/restore
> > > > > > > > > > > processes.
> > > > > > > > > > > > > > > Security is another issue - the job needs to
> run
> > as
> > > > > > 'hbase'
> > > > > > > > > since
> > > > > > > > > > > it
> > > > > > > > > > > > > owns
> > > > > > > > > > > > > > > the data. Having the master launch the job
> makes
> > it
> > > > get
> > > > > > > that
> > > > > > > > > > > > privilege.
> > > > > > > > > > > > > > In
> > > > > > > > > > > > > > > the (2) approach, it's hard to do some of the
> > above
> > > > > > > > management.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Guys, just to reiterate, the patch as such is
> > ready
> > > > > from
> > > > > > > the
> > > > > > > > > > > overall
> > > > > > > > > > > > > > > design/arch point of view (maybe code review is
> > > still
> > > > > > > pending
> > > > > > > > > > from
> > > > > > > > > > > > > > Matteo).
> > > > > > > > > > > > > > > If in the future, we find better ways of doing
> > this
> > > > > > without
> > > > > > > > > using
> > > > > > > > > > > MR,
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > > can certainly consider that. But IMO don't
> think
> > we
> > > > > > should
> > > > > > > > > block
> > > > > > > > > > > this
> > > > > > > > > > > > > > patch
> > > > > > > > > > > > > > > from getting merged.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > ________________________________________
> > > > > > > > > > > > > > > From: 张铎 <palomino219@gmail.com>
> > > > > > > > > > > > > > > Sent: Thursday, September 22, 2016 8:32 PM
> > > > > > > > > > > > > > > To: dev@hbase.apache.org
> > > > > > > > > > > > > > > Subject: Re: [DISCUSSION] MR jobs started by
> > Master
> > > > or
> > > > > RS
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > So what about a standalone service other than
> > > master?
> > > > > You
> > > > > > > can
> > > > > > > > > use
> > > > > > > > > > > > your
> > > > > > > > > > > > > > own
> > > > > > > > > > > > > > > procedure store in that service?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 2016-09-23 11:28 GMT+08:00 Ted Yu <
> > > > yuzhihong@gmail.com
> > > > > >:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > An earlier implementation was client driven.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > But with that approach, it is hard to resume
> if
> > > > there
> > > > > > is
> > > > > > > > > error
> > > > > > > > > > > > > midway.
> > > > > > > > > > > > > > > > Using Procedure V2 makes the backup / restore
> > > more
> > > > > > > robust.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Another consideration is for security. It is
> > hard
> > > > to
> > > > > > > > enforce
> > > > > > > > > > > > security
> > > > > > > > > > > > > > (to
> > > > > > > > > > > > > > > > be implemented) for client driven actions.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Cheers
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Sep 22, 2016, at 8:15 PM, Andrew
> Purtell <
> > > > > > > > > > > > > > andrew.purtell@gmail.com>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > No, this misses Matteo's finer point, which
> > is
> > > > > > > "shelling
> > > > > > > > > out"
> > > > > > > > > > > > from
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > master directly to run MR is a first. Why not
> > > drive
> > > > > > this
> > > > > > > > > with a
> > > > > > > > > > > > > utility
> > > > > > > > > > > > > > > > derived from Tool?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Sep 22, 2016, at 7:57 PM, Vladimir
> > Rodionov
> > > <
> > > > > > > > > > > > > > vladrodionov@gmail.com
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >>>> In our production cluster,  it is a
> common
> > > > case
> > > > > we
> > > > > > > > just
> > > > > > > > > > have
> > > > > > > > > > > > > HDFS
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > >>>> HBase deployed.
> > > > > > > > > > > > > > > > >>>> If our Master/RS depend on MR framework
> > > > > > (especially
> > > > > > > > some
> > > > > > > > > > > > > features
> > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > >>>> have not used at all),  it introduced
> > > another
> > > > > cost
> > > > > > > for
> > > > > > > > > > > > maintain.
> > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > >>>> don't think it is a good idea.
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> So , you are not backup users in this
> case.
> > > Many
> > > > > our
> > > > > > > > > > customers
> > > > > > > > > > > > > have
> > > > > > > > > > > > > > > full
> > > > > > > > > > > > > > > > >> stack deployed and
> > > > > > > > > > > > > > > > >> want see backup to be a standard feature.
> > > > Besides
> > > > > > > this,
> > > > > > > > > > > nothing
> > > > > > > > > > > > > will
> > > > > > > > > > > > > > > > happen
> > > > > > > > > > > > > > > > >> in your cluster
> > > > > > > > > > > > > > > > >> if you won't be doing backups.
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> This discussion (we do not want see M/R
> > > > > dependency)
> > > > > > > goes
> > > > > > > > > to
> > > > > > > > > > > > > nowhere.
> > > > > > > > > > > > > > > We
> > > > > > > > > > > > > > > > >> asked already, at least twice, to suggest
> > > > another
> > > > > > > > > framework
> > > > > > > > > > > > (other
> > > > > > > > > > > > > > > than
> > > > > > > > > > > > > > > > M/R)
> > > > > > > > > > > > > > > > >> for bulk data copy with *conversion*.
> Still
> > > > > waiting
> > > > > > > for
> > > > > > > > > > > > > suggestions.
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> -Vlad
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >>> On Thu, Sep 22, 2016 at 7:49 PM, Ted Yu <
> > > > > > > > > > yuzhihong@gmail.com
> > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > > >>> If MR framework is not deployed in the
> > > cluster,
> > > > > > hbase
> > > > > > > > > still
> > > > > > > > > > > > > > functions
> > > > > > > > > > > > > > > > >>> normally (post merge).
> > > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > > >>> In terms of build time dependency, we
> have
> > > long
> > > > > > been
> > > > > > > > > > > depending
> > > > > > > > > > > > on
> > > > > > > > > > > > > > > > >>> mapreduce. Take a look at ExportSnapshot.
> > > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > > >>> Cheers
> > > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > > >>> On Thu, Sep 22, 2016 at 7:42 PM, Heng
> Chen
> > <
> > > > > > > > > > > > > > heng.chen.1986@gmail.com
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >>> wrote:
> > > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > > >>>> In our production cluster,  it is a
> common
> > > > case
> > > > > we
> > > > > > > > just
> > > > > > > > > > have
> > > > > > > > > > > > > HDFS
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > >>>> HBase deployed.
> > > > > > > > > > > > > > > > >>>> If our Master/RS depend on MR framework
> > > > > > (especially
> > > > > > > > some
> > > > > > > > > > > > > features
> > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > >>>> have not used at all),  it introduced
> > > another
> > > > > cost
> > > > > > > for
> > > > > > > > > > > > maintain.
> > > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > >>>> don't think it is a good idea.
> > > > > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > > > > >>>> 2016-09-23 10:28 GMT+08:00 张铎 <
> > > > > > > palomino219@gmail.com
> > > > > > > > >:
> > > > > > > > > > > > > > > > >>>>> To be specific, for example, our nice
> > > > > > > Backup/Restore
> > > > > > > > > > > feature,
> > > > > > > > > > > > > if
> > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > >>> think
> > > > > > > > > > > > > > > > >>>>> this is not a core feature of HBase,
> then
> > > we
> > > > > > could
> > > > > > > > make
> > > > > > > > > > it
> > > > > > > > > > > > > depend
> > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > >>> MR,
> > > > > > > > > > > > > > > > >>>>> and start a standalone BackupManager
> > > instance
> > > > > > that
> > > > > > > > > > submits
> > > > > > > > > > > MR
> > > > > > > > > > > > > > jobs
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > >>> do
> > > > > > > > > > > > > > > > >>>>> periodical maintenance job. And if we
> > think
> > > > > this
> > > > > > > is a
> > > > > > > > > > core
> > > > > > > > > > > > > > feature
> > > > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > >>>>> everyone should use it, then we'd
> better
> > > > > > implement
> > > > > > > it
> > > > > > > > > > > without
> > > > > > > > > > > > > MR
> > > > > > > > > > > > > > > > >>>>> dependency, like DLS.
> > > > > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > > > > >>>>> Thanks.
> > > > > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > > > > >>>>> 2016-09-23 10:11 GMT+08:00 张铎 <
> > > > > > > palomino219@gmail.com
> > > > > > > > >:
> > > > > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > > > > >>>>>> I‘m -1 on let master or rs launch MR
> > jobs.
> > > > It
> > > > > is
> > > > > > > OK
> > > > > > > > > that
> > > > > > > > > > > > some
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > our
> > > > > > > > > > > > > > > > >>>>>> features depend on MR but I think the
> > > bottom
> > > > > > line
> > > > > > > is
> > > > > > > > > > that
> > > > > > > > > > > we
> > > > > > > > > > > > > > > should
> > > > > > > > > > > > > > > > >>>> launch
> > > > > > > > > > > > > > > > >>>>>> the jobs from outside manually or by
> > other
> > > > > > > services.
> > > > > > > > > > > > > > > > >>>>>>
> > > > > > > > > > > > > > > > >>>>>> 2016-09-23 9:47 GMT+08:00 Andrew
> > Purtell <
> > > > > > > > > > > > > > > andrew.purtell@gmail.com
> > > > > > > > > > > > > > > > >:
> > > > > > > > > > > > > > > > >>>>>>
> > > > > > > > > > > > > > > > >>>>>>> Ok, got it. Well "shelling out" is on
> > the
> > > > > line
> > > > > > I
> > > > > > > > > think,
> > > > > > > > > > > so
> > > > > > > > > > > > a
> > > > > > > > > > > > > > fair
> > > > > > > > > > > > > > > > >>>>>>> question.
> > > > > > > > > > > > > > > > >>>>>>>
> > > > > > > > > > > > > > > > >>>>>>> Can this be driven by a utility
> derived
> > > > from
> > > > > > Tool
> > > > > > > > > like
> > > > > > > > > > > our
> > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > MR
> > > > > > > > > > > > > > > > >>>> apps?
> > > > > > > > > > > > > > > > >>>>>>> The issue is needing the
> > AccessController
> > > > to
> > > > > > > decide
> > > > > > > > > if
> > > > > > > > > > > > > allowed?
> > > > > > > > > > > > > > > But
> > > > > > > > > > > > > > > > >>>> nothing
> > > > > > > > > > > > > > > > >>>>>>> prevents the user from running the
> job
> > > > > > > > > > > > > manually/independently,
> > > > > > > > > > > > > > > > right?
> > > > > > > > > > > > > > > > >>>>>>>
> > > > > > > > > > > > > > > > >>>>>>>> On Sep 22, 2016, at 3:44 PM, Matteo
> > > > > Bertozzi <
> > > > > > > > > > > > > > > > >>>> theo.bertozzi@gmail.com>
> > > > > > > > > > > > > > > > >>>>>>> wrote:
> > > > > > > > > > > > > > > > >>>>>>>>
> > > > > > > > > > > > > > > > >>>>>>>> just a remark. my query was not
> about
> > > > tools
> > > > > > > using
> > > > > > > > MR
> > > > > > > > > > > > > > (everyone i
> > > > > > > > > > > > > > > > >>>> think
> > > > > > > > > > > > > > > > >>>>>>> is
> > > > > > > > > > > > > > > > >>>>>>>> ok with those).
> > > > > > > > > > > > > > > > >>>>>>>> the topic was about: "are we ok with
> > > > running
> > > > > > MR
> > > > > > > > jobs
> > > > > > > > > > > from
> > > > > > > > > > > > > > Master
> > > > > > > > > > > > > > > > >>> and
> > > > > > > > > > > > > > > > >>>> RSs
> > > > > > > > > > > > > > > > >>>>>>>> code?" since this will be the first
> > time
> > > > we
> > > > > do
> > > > > > > > this
> > > > > > > > > > > > > > > > >>>>>>>>
> > > > > > > > > > > > > > > > >>>>>>>> Matteo
> > > > > > > > > > > > > > > > >>>>>>>>
> > > > > > > > > > > > > > > > >>>>>>>>
> > > > > > > > > > > > > > > > >>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM,
> > > Devaraj
> > > > > Das
> > > > > > <
> > > > > > > > > > > > > > > > >>> ddas@hortonworks.com>
> > > > > > > > > > > > > > > > >>>>>>> wrote:
> > > > > > > > > > > > > > > > >>>>>>>>>
> > > > > > > > > > > > > > > > >>>>>>>>> Very much agree; for tools like
> > > > > > ExportSnapshot
> > > > > > > /
> > > > > > > > > > > Backup /
> > > > > > > > > > > > > > > > Restore,
> > > > > > > > > > > > > > > > >>>> it's
> > > > > > > > > > > > > > > > >>>>>>>>> fine to be dependent on MR. MR is
> the
> > > > right
> > > > > > > > > framework
> > > > > > > > > > > for
> > > > > > > > > > > > > > such.
> > > > > > > > > > > > > > > > We
> > > > > > > > > > > > > > > > >>>>>>> should
> > > > > > > > > > > > > > > > >>>>>>>>> also do compactions using MR (just
> > > saying
> > > > > :)
> > > > > > )
> > > > > > > > > > > > > > > > >>>>>>>>> ______________________________
> > > __________
> > > > > > > > > > > > > > > > >>>>>>>>> From: Ted Yu <yuzhihong@gmail.com>
> > > > > > > > > > > > > > > > >>>>>>>>> Sent: Thursday, September 22, 2016
> > 2:00
> > > > PM
> > > > > > > > > > > > > > > > >>>>>>>>> To: dev@hbase.apache.org
> > > > > > > > > > > > > > > > >>>>>>>>> Subject: Re: [DISCUSSION] MR jobs
> > > started
> > > > > by
> > > > > > > > Master
> > > > > > > > > > or
> > > > > > > > > > > RS
> > > > > > > > > > > > > > > > >>>>>>>>>
> > > > > > > > > > > > > > > > >>>>>>>>> I agree - backup / restore is in
> the
> > > same
> > > > > > > > category
> > > > > > > > > as
> > > > > > > > > > > > > import
> > > > > > > > > > > > > > /
> > > > > > > > > > > > > > > > >>>> export.
> > > > > > > > > > > > > > > > >>>>>>>>>
> > > > > > > > > > > > > > > > >>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM,
> > Andrew
> > > > > > > Purtell <
> > > > > > > > > > > > > > > > >>>>>>> andrew.purtell@gmail.com>
> > > > > > > > > > > > > > > > >>>>>>>>> wrote:
> > > > > > > > > > > > > > > > >>>>>>>>>
> > > > > > > > > > > > > > > > >>>>>>>>>> Backup is extra tooling around
> core
> > in
> > > > my
> > > > > > > > opinion.
> > > > > > > > > > > Like
> > > > > > > > > > > > > > import
> > > > > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > >>>>>>> export.
> > > > > > > > > > > > > > > > >>>>>>>>>> Or the optional MOB tool. It's
> fine.
> > > > > > > > > > > > > > > > >>>>>>>>>>
> > > > > > > > > > > > > > > > >>>>>>>>>>> On Sep 22, 2016, at 1:50 PM,
> Matteo
> > > > > > Bertozzi
> > > > > > > <
> > > > > > > > > > > > > > > > >>>> mbertozzi@apache.org>
> > > > > > > > > > > > > > > > >>>>>>>>>> wrote:
> > > > > > > > > > > > > > > > >>>>>>>>>>>
> > > > > > > > > > > > > > > > >>>>>>>>>>> What's the latest opinion around
> > > > running
> > > > > MR
> > > > > > > > jobs
> > > > > > > > > > from
> > > > > > > > > > > > > hbase
> > > > > > > > > > > > > > > > >>>> (Master
> > > > > > > > > > > > > > > > >>>>>>> or
> > > > > > > > > > > > > > > > >>>>>>>>>> RS)?
> > > > > > > > > > > > > > > > >>>>>>>>>>>
> > > > > > > > > > > > > > > > >>>>>>>>>>> I remember in the past that there
> > was
> > > > > > > > discussion
> > > > > > > > > > > about
> > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > >>> having
> > > > > > > > > > > > > > > > >>>> MR
> > > > > > > > > > > > > > > > >>>>>>>>> has
> > > > > > > > > > > > > > > > >>>>>>>>>>> direct dependency of hbase.
> > > > > > > > > > > > > > > > >>>>>>>>>>>
> > > > > > > > > > > > > > > > >>>>>>>>>>> I think some of discussion where
> > > around
> > > > > MOB
> > > > > > > > that
> > > > > > > > > > had
> > > > > > > > > > > a
> > > > > > > > > > > > MR
> > > > > > > > > > > > > > job
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > >>>>>>>>> compact,
> > > > > > > > > > > > > > > > >>>>>>>>>>> that later was transformed in a
> > > non-MR
> > > > > job
> > > > > > to
> > > > > > > > be
> > > > > > > > > > > > merged,
> > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > >>>> we
> > > > > > > > > > > > > > > > >>>>>>>>> had a
> > > > > > > > > > > > > > > > >>>>>>>>>>> similar discussion for log
> > > > split/replay.
> > > > > > > > > > > > > > > > >>>>>>>>>>>
> > > > > > > > > > > > > > > > >>>>>>>>>>> the latest is the new Backup
> > feature
> > > > > > > > > (HBASE-7912),
> > > > > > > > > > > that
> > > > > > > > > > > > > > runs
> > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > >>> MR
> > > > > > > > > > > > > > > > >>>> job
> > > > > > > > > > > > > > > > >>>>>>>>>> from
> > > > > > > > > > > > > > > > >>>>>>>>>>> the master to copy data or
> restore
> > > > data.
> > > > > > > > > > > > > > > > >>>>>>>>>>> (backup is also "not really core"
> > as
> > > > in..
> > > > > > if
> > > > > > > > you
> > > > > > > > > > > don't
> > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > >>> backup
> > > > > > > > > > > > > > > > >>>>>>>>> you'll
> > > > > > > > > > > > > > > > >>>>>>>>>>> not end up running MR jobs, but
> > this
> > > > was
> > > > > > > > probably
> > > > > > > > > > > true
> > > > > > > > > > > > > for
> > > > > > > > > > > > > > > MOB
> > > > > > > > > > > > > > > > >>> as
> > > > > > > > > > > > > > > > >>>> in
> > > > > > > > > > > > > > > > >>>>>>>>> "if
> > > > > > > > > > > > > > > > >>>>>>>>>>> you don't enable MOB you don't
> need
> > > > MR")
> > > > > > > > > > > > > > > > >>>>>>>>>>>
> > > > > > > > > > > > > > > > >>>>>>>>>>> any thoughts? do we a rule that
> > says
> > > > "we
> > > > > > > don't
> > > > > > > > > want
> > > > > > > > > > > to
> > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > >>> hbase
> > > > > > > > > > > > > > > > >>>> run
> > > > > > > > > > > > > > > > >>>>>>>>> MR
> > > > > > > > > > > > > > > > >>>>>>>>>>> jobs, only tool started manually
> by
> > > the
> > > > > > user
> > > > > > > > can
> > > > > > > > > do
> > > > > > > > > > > > > that".
> > > > > > > > > > > > > > or
> > > > > > > > > > > > > > > > >>> can
> > > > > > > > > > > > > > > > >>>> we
> > > > > > > > > > > > > > > > >>>>>>>>>> start
> > > > > > > > > > > > > > > > >>>>>>>>>>> adding MR calls around without
> > > > problems?
> > > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message