hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devaraj Das <d...@hortonworks.com>
Subject Re: [DISCUSSION] MR jobs started by Master or RS
Date Fri, 23 Sep 2016 04:38:55 GMT
Guys, first off apologies for bringing in the topic of MR-based compactions.. But I was thinking
more about the SpliceMachine approach of managing compactions in Spark where apparently they
saw a lot of benefits. Apologies for giving you that sore throat Andrew; I really didn't mean
to :-)

So on this issue, we have these on the plate:
0. Somehow not use MR but something like that
1. Run a standalone service other than master
2. Shell out from the master

I don't think we have a good answer to (0), and I don't think it's even worth the effort of
trying to build something when MR is already there, and being used by HBase already for some
operations.

On (1), we have to deal with a myriad of issues - HA of the server not being the least of
them all. Security (kerberos authentication, another keytab to manage, etc. etc. etc.). IMO,
that approach is DOA. Instead let's substitute that (1) with the HBase Master. I haven't seen
any good reason why the HBase master shouldn't launch MR jobs if needed. It's not ideal; agreed.

Now before going to (2), let's see what are the benefits of running the backup/restore jobs
from the master. I think Ted has summarized some of the issues that we need to take care of
- basically, the master can keep track of running jobs, and should it fail, the backup master
can continue keeping track of it (since the jobId would have been recorded in the proc WAL).
The master can also do cleanup, etc. of failed backup/restore processes. Security is another
issue - the job needs to run as 'hbase' since it owns the data. Having the master launch the
job makes it get that privilege. In the (2) approach, it's hard to do some of the above management.

Guys, just to reiterate, the patch as such is ready from the overall design/arch point of
view (maybe code review is still pending from Matteo). If in the future, we find better ways
of doing this without using MR, we can certainly consider that. But IMO don't think we should
block this patch from getting merged.

________________________________________
From: 张铎 <palomino219@gmail.com>
Sent: Thursday, September 22, 2016 8:32 PM
To: dev@hbase.apache.org
Subject: Re: [DISCUSSION] MR jobs started by Master or RS

So what about a standalone service other than master? You can use your own
procedure store in that service?

2016-09-23 11:28 GMT+08:00 Ted Yu <yuzhihong@gmail.com>:

> An earlier implementation was client driven.
>
> But with that approach, it is hard to resume if there is error midway.
> Using Procedure V2 makes the backup / restore more robust.
>
> Another consideration is for security. It is hard to enforce security (to
> be implemented) for client driven actions.
>
> Cheers
>
> > On Sep 22, 2016, at 8:15 PM, Andrew Purtell <andrew.purtell@gmail.com>
> wrote:
> >
> > No, this misses Matteo's finer point, which is "shelling out" from the
> master directly to run MR is a first. Why not drive this with a utility
> derived from Tool?
> >
> > On Sep 22, 2016, at 7:57 PM, Vladimir Rodionov <vladrodionov@gmail.com>
> wrote:
> >
> >>>> In our production cluster,  it is a common case we just have HDFS and
> >>>> HBase deployed.
> >>>> If our Master/RS depend on MR framework (especially some features we
> >>>> have not used at all),  it introduced another cost for maintain.  I
> >>>> don't think it is a good idea.
> >>
> >> So , you are not backup users in this case. Many our customers have full
> >> stack deployed and
> >> want see backup to be a standard feature. Besides this, nothing will
> happen
> >> in your cluster
> >> if you won't be doing backups.
> >>
> >> This discussion (we do not want see M/R dependency) goes to nowhere. We
> >> asked already, at least twice, to suggest another framework (other than
> M/R)
> >> for bulk data copy with *conversion*. Still waiting for suggestions.
> >>
> >> -Vlad
> >>
> >>
> >>
> >>
> >>> On Thu, Sep 22, 2016 at 7:49 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >>>
> >>> If MR framework is not deployed in the cluster, hbase still functions
> >>> normally (post merge).
> >>>
> >>> In terms of build time dependency, we have long been depending on
> >>> mapreduce. Take a look at ExportSnapshot.
> >>>
> >>> Cheers
> >>>
> >>> On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen <heng.chen.1986@gmail.com>
> >>> wrote:
> >>>
> >>>> In our production cluster,  it is a common case we just have HDFS and
> >>>> HBase deployed.
> >>>> If our Master/RS depend on MR framework (especially some features we
> >>>> have not used at all),  it introduced another cost for maintain.  I
> >>>> don't think it is a good idea.
> >>>>
> >>>> 2016-09-23 10:28 GMT+08:00 张铎 <palomino219@gmail.com>:
> >>>>> To be specific, for example, our nice Backup/Restore feature, if
we
> >>> think
> >>>>> this is not a core feature of HBase, then we could make it depend
on
> >>> MR,
> >>>>> and start a standalone BackupManager instance that submits MR jobs
to
> >>> do
> >>>>> periodical maintenance job. And if we think this is a core feature
> that
> >>>>> everyone should use it, then we'd better implement it without MR
> >>>>> dependency, like DLS.
> >>>>>
> >>>>> Thanks.
> >>>>>
> >>>>> 2016-09-23 10:11 GMT+08:00 张铎 <palomino219@gmail.com>:
> >>>>>
> >>>>>> I‘m -1 on let master or rs launch MR jobs. It is OK that some
of our
> >>>>>> features depend on MR but I think the bottom line is that we
should
> >>>> launch
> >>>>>> the jobs from outside manually or by other services.
> >>>>>>
> >>>>>> 2016-09-23 9:47 GMT+08:00 Andrew Purtell <andrew.purtell@gmail.com
> >:
> >>>>>>
> >>>>>>> Ok, got it. Well "shelling out" is on the line I think,
so a fair
> >>>>>>> question.
> >>>>>>>
> >>>>>>> Can this be driven by a utility derived from Tool like our
other MR
> >>>> apps?
> >>>>>>> The issue is needing the AccessController to decide if allowed?
But
> >>>> nothing
> >>>>>>> prevents the user from running the job manually/independently,
> right?
> >>>>>>>
> >>>>>>>> On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi <
> >>>> theo.bertozzi@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> just a remark. my query was not about tools using MR
(everyone i
> >>>> think
> >>>>>>> is
> >>>>>>>> ok with those).
> >>>>>>>> the topic was about: "are we ok with running MR jobs
from Master
> >>> and
> >>>> RSs
> >>>>>>>> code?" since this will be the first time we do this
> >>>>>>>>
> >>>>>>>> Matteo
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das <
> >>> ddas@hortonworks.com>
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Very much agree; for tools like ExportSnapshot /
Backup /
> Restore,
> >>>> it's
> >>>>>>>>> fine to be dependent on MR. MR is the right framework
for such.
> We
> >>>>>>> should
> >>>>>>>>> also do compactions using MR (just saying :) )
> >>>>>>>>> ________________________________________
> >>>>>>>>> From: Ted Yu <yuzhihong@gmail.com>
> >>>>>>>>> Sent: Thursday, September 22, 2016 2:00 PM
> >>>>>>>>> To: dev@hbase.apache.org
> >>>>>>>>> Subject: Re: [DISCUSSION] MR jobs started by Master
or RS
> >>>>>>>>>
> >>>>>>>>> I agree - backup / restore is in the same category
as import /
> >>>> export.
> >>>>>>>>>
> >>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell
<
> >>>>>>> andrew.purtell@gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Backup is extra tooling around core in my opinion.
Like import
> or
> >>>>>>> export.
> >>>>>>>>>> Or the optional MOB tool. It's fine.
> >>>>>>>>>>
> >>>>>>>>>>> On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi
<
> >>>> mbertozzi@apache.org>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> What's the latest opinion around running
MR jobs from hbase
> >>>> (Master
> >>>>>>> or
> >>>>>>>>>> RS)?
> >>>>>>>>>>>
> >>>>>>>>>>> I remember in the past that there was discussion
about not
> >>> having
> >>>> MR
> >>>>>>>>> has
> >>>>>>>>>>> direct dependency of hbase.
> >>>>>>>>>>>
> >>>>>>>>>>> I think some of discussion where around
MOB that had a MR job
> to
> >>>>>>>>> compact,
> >>>>>>>>>>> that later was transformed in a non-MR job
to be merged, I
> think
> >>>> we
> >>>>>>>>> had a
> >>>>>>>>>>> similar discussion for log split/replay.
> >>>>>>>>>>>
> >>>>>>>>>>> the latest is the new Backup feature (HBASE-7912),
that runs a
> >>> MR
> >>>> job
> >>>>>>>>>> from
> >>>>>>>>>>> the master to copy data or restore data.
> >>>>>>>>>>> (backup is also "not really core" as in..
if you don't use
> >>> backup
> >>>>>>>>> you'll
> >>>>>>>>>>> not end up running MR jobs, but this was
probably true for MOB
> >>> as
> >>>> in
> >>>>>>>>> "if
> >>>>>>>>>>> you don't enable MOB you don't need MR")
> >>>>>>>>>>>
> >>>>>>>>>>> any thoughts? do we a rule that says "we
don't want to have
> >>> hbase
> >>>> run
> >>>>>>>>> MR
> >>>>>>>>>>> jobs, only tool started manually by the
user can do that". or
> >>> can
> >>>> we
> >>>>>>>>>> start
> >>>>>>>>>>> adding MR calls around without problems?
> >>>
>
Mime
View raw message