hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 张铎 <palomino...@gmail.com>
Subject Re: [DISCUSSION] MR jobs started by Master or RS
Date Fri, 23 Sep 2016 05:44:52 GMT
If you guys have already implemented the feature in the MR way and the
patch is ready for landing on master, I'm a -0 on it as I do not want to
block the development progress.

But I strongly suggest later we need to revisit the design and see if we
can seperated the logic from HMaster as much as possible. HA is not a big
problem if you do not store any metada locally. But the ugly code in
HMaster is readlly a problem...

And for security, I have a issue pending for a long time. Can someone help
taking a simple look at it? This is what I mean, ugly code... logout and
destroy the credentials in a subject when it is still being used, and
declared as LimitPrivacy so I can not change the behivor and the only way
to fix it is to write another piece of ugly code...

https://issues.apache.org/jira/browse/HADOOP-13433

2016-09-23 12:53 GMT+08:00 Vladimir Rodionov <vladrodionov@gmail.com>:

> >> If in the future, we find better ways of doing this without using MR, we
> can certainly consider that
>
> Our framework for distributed operations is abstract and allows
> different implementations. MR is just one implementation we provide.
>
> -Vlad
>
> On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das <ddas@hortonworks.com> wrote:
>
> > Guys, first off apologies for bringing in the topic of MR-based
> > compactions.. But I was thinking more about the SpliceMachine approach of
> > managing compactions in Spark where apparently they saw a lot of
> benefits.
> > Apologies for giving you that sore throat Andrew; I really didn't mean to
> > :-)
> >
> > So on this issue, we have these on the plate:
> > 0. Somehow not use MR but something like that
> > 1. Run a standalone service other than master
> > 2. Shell out from the master
> >
> > I don't think we have a good answer to (0), and I don't think it's even
> > worth the effort of trying to build something when MR is already there,
> and
> > being used by HBase already for some operations.
> >
> > On (1), we have to deal with a myriad of issues - HA of the server not
> > being the least of them all. Security (kerberos authentication, another
> > keytab to manage, etc. etc. etc.). IMO, that approach is DOA. Instead
> let's
> > substitute that (1) with the HBase Master. I haven't seen any good reason
> > why the HBase master shouldn't launch MR jobs if needed. It's not ideal;
> > agreed.
> >
> > Now before going to (2), let's see what are the benefits of running the
> > backup/restore jobs from the master. I think Ted has summarized some of
> the
> > issues that we need to take care of - basically, the master can keep
> track
> > of running jobs, and should it fail, the backup master can continue
> keeping
> > track of it (since the jobId would have been recorded in the proc WAL).
> The
> > master can also do cleanup, etc. of failed backup/restore processes.
> > Security is another issue - the job needs to run as 'hbase' since it owns
> > the data. Having the master launch the job makes it get that privilege.
> In
> > the (2) approach, it's hard to do some of the above management.
> >
> > Guys, just to reiterate, the patch as such is ready from the overall
> > design/arch point of view (maybe code review is still pending from
> Matteo).
> > If in the future, we find better ways of doing this without using MR, we
> > can certainly consider that. But IMO don't think we should block this
> patch
> > from getting merged.
> >
> > ________________________________________
> > From: 张铎 <palomino219@gmail.com>
> > Sent: Thursday, September 22, 2016 8:32 PM
> > To: dev@hbase.apache.org
> > Subject: Re: [DISCUSSION] MR jobs started by Master or RS
> >
> > So what about a standalone service other than master? You can use your
> own
> > procedure store in that service?
> >
> > 2016-09-23 11:28 GMT+08:00 Ted Yu <yuzhihong@gmail.com>:
> >
> > > An earlier implementation was client driven.
> > >
> > > But with that approach, it is hard to resume if there is error midway.
> > > Using Procedure V2 makes the backup / restore more robust.
> > >
> > > Another consideration is for security. It is hard to enforce security
> (to
> > > be implemented) for client driven actions.
> > >
> > > Cheers
> > >
> > > > On Sep 22, 2016, at 8:15 PM, Andrew Purtell <
> andrew.purtell@gmail.com>
> > > wrote:
> > > >
> > > > No, this misses Matteo's finer point, which is "shelling out" from
> the
> > > master directly to run MR is a first. Why not drive this with a utility
> > > derived from Tool?
> > > >
> > > > On Sep 22, 2016, at 7:57 PM, Vladimir Rodionov <
> vladrodionov@gmail.com
> > >
> > > wrote:
> > > >
> > > >>>> In our production cluster,  it is a common case we just have
HDFS
> > and
> > > >>>> HBase deployed.
> > > >>>> If our Master/RS depend on MR framework (especially some features
> we
> > > >>>> have not used at all),  it introduced another cost for maintain.
> I
> > > >>>> don't think it is a good idea.
> > > >>
> > > >> So , you are not backup users in this case. Many our customers have
> > full
> > > >> stack deployed and
> > > >> want see backup to be a standard feature. Besides this, nothing will
> > > happen
> > > >> in your cluster
> > > >> if you won't be doing backups.
> > > >>
> > > >> This discussion (we do not want see M/R dependency) goes to nowhere.
> > We
> > > >> asked already, at least twice, to suggest another framework (other
> > than
> > > M/R)
> > > >> for bulk data copy with *conversion*. Still waiting for suggestions.
> > > >>
> > > >> -Vlad
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>> On Thu, Sep 22, 2016 at 7:49 PM, Ted Yu <yuzhihong@gmail.com>
> wrote:
> > > >>>
> > > >>> If MR framework is not deployed in the cluster, hbase still
> functions
> > > >>> normally (post merge).
> > > >>>
> > > >>> In terms of build time dependency, we have long been depending
on
> > > >>> mapreduce. Take a look at ExportSnapshot.
> > > >>>
> > > >>> Cheers
> > > >>>
> > > >>> On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen <
> heng.chen.1986@gmail.com
> > >
> > > >>> wrote:
> > > >>>
> > > >>>> In our production cluster,  it is a common case we just have
HDFS
> > and
> > > >>>> HBase deployed.
> > > >>>> If our Master/RS depend on MR framework (especially some features
> we
> > > >>>> have not used at all),  it introduced another cost for maintain.
> I
> > > >>>> don't think it is a good idea.
> > > >>>>
> > > >>>> 2016-09-23 10:28 GMT+08:00 张铎 <palomino219@gmail.com>:
> > > >>>>> To be specific, for example, our nice Backup/Restore feature,
if
> we
> > > >>> think
> > > >>>>> this is not a core feature of HBase, then we could make
it depend
> > on
> > > >>> MR,
> > > >>>>> and start a standalone BackupManager instance that submits
MR
> jobs
> > to
> > > >>> do
> > > >>>>> periodical maintenance job. And if we think this is a
core
> feature
> > > that
> > > >>>>> everyone should use it, then we'd better implement it
without MR
> > > >>>>> dependency, like DLS.
> > > >>>>>
> > > >>>>> Thanks.
> > > >>>>>
> > > >>>>> 2016-09-23 10:11 GMT+08:00 张铎 <palomino219@gmail.com>:
> > > >>>>>
> > > >>>>>> I‘m -1 on let master or rs launch MR jobs. It is
OK that some of
> > our
> > > >>>>>> features depend on MR but I think the bottom line
is that we
> > should
> > > >>>> launch
> > > >>>>>> the jobs from outside manually or by other services.
> > > >>>>>>
> > > >>>>>> 2016-09-23 9:47 GMT+08:00 Andrew Purtell <
> > andrew.purtell@gmail.com
> > > >:
> > > >>>>>>
> > > >>>>>>> Ok, got it. Well "shelling out" is on the line
I think, so a
> fair
> > > >>>>>>> question.
> > > >>>>>>>
> > > >>>>>>> Can this be driven by a utility derived from Tool
like our
> other
> > MR
> > > >>>> apps?
> > > >>>>>>> The issue is needing the AccessController to decide
if allowed?
> > But
> > > >>>> nothing
> > > >>>>>>> prevents the user from running the job manually/independently,
> > > right?
> > > >>>>>>>
> > > >>>>>>>> On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi
<
> > > >>>> theo.bertozzi@gmail.com>
> > > >>>>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>> just a remark. my query was not about tools
using MR
> (everyone i
> > > >>>> think
> > > >>>>>>> is
> > > >>>>>>>> ok with those).
> > > >>>>>>>> the topic was about: "are we ok with running
MR jobs from
> Master
> > > >>> and
> > > >>>> RSs
> > > >>>>>>>> code?" since this will be the first time we
do this
> > > >>>>>>>>
> > > >>>>>>>> Matteo
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj
Das <
> > > >>> ddas@hortonworks.com>
> > > >>>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>> Very much agree; for tools like ExportSnapshot
/ Backup /
> > > Restore,
> > > >>>> it's
> > > >>>>>>>>> fine to be dependent on MR. MR is the
right framework for
> such.
> > > We
> > > >>>>>>> should
> > > >>>>>>>>> also do compactions using MR (just saying
:) )
> > > >>>>>>>>> ________________________________________
> > > >>>>>>>>> From: Ted Yu <yuzhihong@gmail.com>
> > > >>>>>>>>> Sent: Thursday, September 22, 2016 2:00
PM
> > > >>>>>>>>> To: dev@hbase.apache.org
> > > >>>>>>>>> Subject: Re: [DISCUSSION] MR jobs started
by Master or RS
> > > >>>>>>>>>
> > > >>>>>>>>> I agree - backup / restore is in the same
category as import
> /
> > > >>>> export.
> > > >>>>>>>>>
> > > >>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM, Andrew
Purtell <
> > > >>>>>>> andrew.purtell@gmail.com>
> > > >>>>>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> Backup is extra tooling around core
in my opinion. Like
> import
> > > or
> > > >>>>>>> export.
> > > >>>>>>>>>> Or the optional MOB tool. It's fine.
> > > >>>>>>>>>>
> > > >>>>>>>>>>> On Sep 22, 2016, at 1:50 PM, Matteo
Bertozzi <
> > > >>>> mbertozzi@apache.org>
> > > >>>>>>>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> What's the latest opinion around
running MR jobs from hbase
> > > >>>> (Master
> > > >>>>>>> or
> > > >>>>>>>>>> RS)?
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> I remember in the past that there
was discussion about not
> > > >>> having
> > > >>>> MR
> > > >>>>>>>>> has
> > > >>>>>>>>>>> direct dependency of hbase.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> I think some of discussion where
around MOB that had a MR
> job
> > > to
> > > >>>>>>>>> compact,
> > > >>>>>>>>>>> that later was transformed in
a non-MR job to be merged, I
> > > think
> > > >>>> we
> > > >>>>>>>>> had a
> > > >>>>>>>>>>> similar discussion for log split/replay.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> the latest is the new Backup feature
(HBASE-7912), that
> runs
> > a
> > > >>> MR
> > > >>>> job
> > > >>>>>>>>>> from
> > > >>>>>>>>>>> the master to copy data or restore
data.
> > > >>>>>>>>>>> (backup is also "not really core"
as in.. if you don't use
> > > >>> backup
> > > >>>>>>>>> you'll
> > > >>>>>>>>>>> not end up running MR jobs, but
this was probably true for
> > MOB
> > > >>> as
> > > >>>> in
> > > >>>>>>>>> "if
> > > >>>>>>>>>>> you don't enable MOB you don't
need MR")
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> any thoughts? do we a rule that
says "we don't want to have
> > > >>> hbase
> > > >>>> run
> > > >>>>>>>>> MR
> > > >>>>>>>>>>> jobs, only tool started manually
by the user can do that".
> or
> > > >>> can
> > > >>>> we
> > > >>>>>>>>>> start
> > > >>>>>>>>>>> adding MR calls around without
problems?
> > > >>>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message