Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 404D3200B8A for ; Sat, 24 Sep 2016 20:20:18 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 32947160AD1; Sat, 24 Sep 2016 18:20:18 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 76AAD160AC2 for ; Sat, 24 Sep 2016 20:20:16 +0200 (CEST) Received: (qmail 93537 invoked by uid 500); 24 Sep 2016 18:20:15 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 93525 invoked by uid 99); 24 Sep 2016 18:20:14 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 24 Sep 2016 18:20:14 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 75C601A03D6 for ; Sat, 24 Sep 2016 18:20:14 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.879 X-Spam-Level: * X-Spam-Status: No, score=1.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id LhuTvD2xBBC2 for ; Sat, 24 Sep 2016 18:20:05 +0000 (UTC) Received: from mail-yw0-f177.google.com (mail-yw0-f177.google.com [209.85.161.177]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id CE0AD5F366 for ; Sat, 24 Sep 2016 18:20:04 +0000 (UTC) Received: by mail-yw0-f177.google.com with SMTP id u82so134515859ywc.2 for ; Sat, 24 Sep 2016 11:20:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=gi1WQhvz7Y8rHOfbjrssDG6VeO7zpjNL1LAGptGySW4=; b=vH6eY7YlLwdLSCBPCyVqBm/Br9g33JUAOG3BOa/LRo+5kRRx+JTaEiGF6Yvblh9VlL wZ9eieu0nUF6m71ZNZP73UTVo8d90tPGeuUEdNG1rIeH8uf3AvcMvU+G8Jl4cGgF8sbq Gl5TnrEZempa79ref6yRRBUhGJ8JKgicur/zRexykru0ozlbMkZa4CF/W9+ekZZOn2aL z8qwU+loQ8bcYT+eFeOv633znGcqdi5LMq7tCAEJMRfun8qRtn0XtZhEucWjBTIVh9lJ 9R3N45qGoY3qt3J+aI2AeRZRE9SRkt5Gfnm1YzSu96V2VZOXixmDNfrpILdLk9PyAGoC US1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=gi1WQhvz7Y8rHOfbjrssDG6VeO7zpjNL1LAGptGySW4=; b=VJkyY4pdneSs2d5csea6NuzCl8tjdZxZ9th57SHtx6i0xEsrunBqowF5PvI+uf0vCl dBF0ZAIpfoXq5ZTDiwLT6Ohqx0FWlcYPCh7qqn130r2d43l7z68IN0fE1KR7UgBiBy3W 8ycEmzvRDQIu5kiEK3TuBVpMqWMU0gmE/TGC9d2Z+zyr8N/ICK7ANzVR7h39Z6dxcA6f IFOib0oRYmTVvzwfHbVOFM3cNeNac0iqtsNtKfeDwgijIYgQrGkLEqpZqWR65EMDK06X KooPrFucE3248n6mjmpiSO/mD8xLA4ENbIeEkE/kYY6/VHKnxEkgOf3Mr7gXrC7RHyUJ tmhA== X-Gm-Message-State: AE9vXwNBvouJ6MPE/LOwkf2D6UKOrbakVezXz5k0ByEh4c/qpfwl9hm1jJS1L/dk/12EoJa3oYcRHU0oDeYCJQ== X-Received: by 10.13.233.194 with SMTP id s185mr10631121ywe.191.1474741203429; Sat, 24 Sep 2016 11:20:03 -0700 (PDT) MIME-Version: 1.0 Received: by 10.37.171.33 with HTTP; Sat, 24 Sep 2016 11:20:00 -0700 (PDT) In-Reply-To: References: <47FD45BF-04C9-42AA-A526-286DD0D58121@gmail.com> From: Ted Yu Date: Sat, 24 Sep 2016 11:20:00 -0700 Message-ID: Subject: Re: Backup Implementation (WAS => Re: [DISCUSSION] MR jobs started by Master or RS) To: "dev@hbase.apache.org" Content-Type: multipart/alternative; boundary=94eb2c075fe891093e053d44f16a archived-at: Sat, 24 Sep 2016 18:20:18 -0000 --94eb2c075fe891093e053d44f16a Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable bq. don't call out to an external framework we don't own from master (or regionserver) code So the standalone service would run out of proc - in the same vein as REST or thrift server. Cheers On Sat, Sep 24, 2016 at 10:40 AM, Andrew Purtell wrote: > I was attempting to summarize Ted. > > A new maven module sounds like a good idea to me. Or we could move all th= e > tools that use MR out to one. Or... > > The key takeaway seems to be don't call out to an external framework we > don't own from master (or regionserver) code. > > > On Sep 24, 2016, at 10:15 AM, Ted Yu wrote: > > > > bq. Internally the tool can also use the procedure framework for state > > durability > > > > Isn't this the standalone service I proposed this morning ? > > > > bq. Move cross HBase and MR coordination to a separate tool > > > > Where should this tool live (hbase-backup module) ? > > > > Thanks > > > > > > On Sat, Sep 24, 2016 at 9:58 AM, Andrew Purtell < > andrew.purtell@gmail.com> > > wrote: > > > >> At branch merge voting time now more eyes are getting on the design > issues > >> with dissenting opinion emerging. This is the branch merge process > working > >> as our community has designed it. Because this is the first full proje= ct > >> review of the code and implementation I think we all have to be > flexible. I > >> see the community as trying to narrow the technical objection at issue > to > >> the smallest possible scope. It's simple: don't call out to an externa= l > >> execution framework we don't own from core master (and by extension > >> regionserver) code. We had this objection before to a proposed externa= l > >> compaction implementation for > >> MOB so should not come as a surprise. Please let me know if I have > >> misstated this. > >> > >> This would seem to require a modest refactor of coordination to move > >> invocation of MR code out from any core code path. To restate what I > think > >> is an emerging recommendation: Move cross HBase and MR coordination to= a > >> separate tool. This tool can ask the master to invoke procedures on th= e > >> HBase side that do first mile export and last mile restore. (Internall= y > the > >> tool can also use the procedure framework for state durability, perhap= s, > >> just a thought.) Then the tool can further drive the things done with = MR > >> like shipping data off cluster or moving remote data in place and > preparing > >> it for import. These activities do not need procedure coordination and > >> involvement of the HBase master. Only the first and last mile of the > >> process needs atomicity within the HBase deploy. Please let me know if= I > >> have misstated this. > >> > >> > >>> On Sep 24, 2016, at 8:17 AM, Ted Yu wrote: > >>> > >>> bq. procedure gives you a retry mechanism on failure > >>> > >>> We do need this mechanism. Take a look at the multi-step > >>> in FullTableBackupProcedure, etc. > >>> > >>> bq. let the user export it later when he wants > >>> > >>> This would make supporting security more complex (user A shouldn't be > >>> exporting user B's backup). And it is not user friendly - at the time > >>> backup request is issued, the following is specified: > >>> > >>> + + " BACKUP_ROOT The full root path to store the backup > >>> image,\n" > >>> + + " the prefix can be hdfs, webhdfs or > gpfs\n" > >>> > >>> Backup root is an integral part of backup manifest. > >>> > >>> Cheers > >>> > >>> > >>> On Sat, Sep 24, 2016 at 7:59 AM, Matteo Bertozzi < > >> theo.bertozzi@gmail.com> > >>> wrote: > >>> > >>>>> On Sat, Sep 24, 2016 at 7:19 AM, Ted Yu wrote= : > >>>>> > >>>>> Ideally the export should have one job running which does the retry > (on > >>>>> failed partition) itself. > >>>>> > >>>> > >>>> procedure gives you a retry mechanism on failure. if you don't use > that, > >>>> than you don't need procedure. > >>>> if you want you can start a procedure executor in a non master proce= ss > >> (the > >>>> hbase-procedure is a separate package and does not depend on master)= . > >> but > >>>> again, export seems a case where you don't need procedure. > >>>> > >>>> like snapshot, the logic may just be: ask the master to take a backu= p. > >> and > >>>> let the user export it later when he wants. so you avoid having a MR > job > >>>> started by the master since people does not seems to like it. > >>>> > >>>> for restore (I think that is where you use the MR splitter) you can > >>>> probably just have a backup ready (already splitted). there is > already a > >>>> jira that should do that HBASE-14135. instead of doing the operation > of > >>>> split/merge on restore. you consolidate the backup "offline" (mr job > >>>> started by the user) and then ask to restore the backup. > >>>> > >>>> > >>>>> > >>>>> On Sat, Sep 24, 2016 at 7:04 AM, Matteo Bertozzi < > >>>> theo.bertozzi@gmail.com> > >>>>> wrote: > >>>>> > >>>>>> as far as I understand the code, you don't need procedure for the > >>>> export > >>>>>> itself. > >>>>>> the export operation is already idempotent, since you are just > copying > >>>>>> files. > >>>>>> if the file exist and is complete (check length, checksum, ...) yo= u > >> can > >>>>>> skip it, > >>>>>> otherwise you'll send it over again. > >>>>>> > >>>>>> you need the proc for taking the backup and restoring, > >>>>>> because you want to complete the operation and end up with a > >> consistent > >>>>>> state > >>>>>> across the multiple components you are updating (meta, fs, ...) > >>>>>> but again, for export you can just run the tool over and over unti= l > >> the > >>>>>> operation succeed, and that should be ok. > >>>>>> > >>>>>> > >>>>>> > >>>>>> Matteo > >>>>>> > >>>>>> > >>>>>>> On Sat, Sep 24, 2016 at 6:54 AM, Ted Yu > wrote: > >>>>>>> > >>>>>>> Master is involved in this discussion because currently only Mast= er > >>>>>>> instantiates ProcedureExecutor which runs the 3 Procedures for > >>>> backup / > >>>>>>> restore. > >>>>>>> > >>>>>>> What if an optional standalone service which hosts > ProcedureExecutor > >>>> is > >>>>>>> used for this purpose ? > >>>>>>> Would that have better chance of giving us middle ground so that = we > >>>> can > >>>>>>> move this forward ? > >>>>>>> > >>>>>>> Cheers > >>>>>>> > >>>>>>>> On Fri, Sep 23, 2016 at 5:15 PM, Stack wrote: > >>>>>>>> > >>>>>>>> (Moved out of the Master doing MR DISCUSSION) > >>>>>>>> > >>>>>>>> On Fri, Sep 23, 2016 at 12:24 PM, Vladimir Rodionov < > >>>>>>>> vladrodionov@gmail.com> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>>>> -1 on that backup be in core hbase > >>>>>>>>> > >>>>>>>>> Not sure I understand what it means. > >>>>>>>>> > >>>>>>>>> Sorry for the imprecision. > >>>>>>>> > >>>>>>>> The -1 is NOT against backup/restore. I am -1 on MR as a > dependency > >>>>> and > >>>>>>> so > >>>>>>>> -1 on the Master running backup/restore MR jobs, even if optiona= l. > >>>>>>>> > >>>>>>>> Master should not depend on MR. We've gone out of our way to avo= id > >>>>>> taking > >>>>>>>> MR on as dependency in the past. Seems late in the game for us t= o > >>>>>> change > >>>>>>>> our opinion on this. If we didn't do it for distributed log > >>>>> splitting, > >>>>>> or > >>>>>>>> MOB, why would we do it to support an optional backup/restore? > >>>>>>>> > >>>>>>>> I have opinions on the questions below -- i.e. that Master runni= ng > >>>>>>>> backup/restore is outside of the Master's charge -- but they are > >>>> not > >>>>>>> worth > >>>>>>>> much since I've not done much by way of review or contrib to > >>>>>>> backup/restore > >>>>>>>> other than to try it as a 'user' so I'll keep them to myself unt= il > >>>> I > >>>>>> do. > >>>>>>> I > >>>>>>>> only came out from under my shell to participate on the MR as > >>>>>> dependency > >>>>>>>> chat. > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> M > >>>>>>>> > >>>>>>>> > >>>>>>>> 1. We are not allowed to use Master to orchestrate the whole > >>>> process? > >>>>>>>> > >>>>>>>> > >>>>>>>> We > >>>>>>>>> have already brought up all advantages of using > >>>>>>>>> Master and distributed procedures for backup and restore. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Downside of moving this to client tool is lack of fault > >>>> tolerance: > >>>>>>>>> 1.1 Client won't be allowed to do any operations, that can, > >>>>>>> potentially > >>>>>>>>> affect > >>>>>>>>> cluster, such as disabling splits/merges, balancer. > >>>>>>>>> 1.2 In case of client failure who will be doing the whole > >>>> rollback > >>>>>>>> stuff? > >>>>>>>>> We are trying to make it atomic. > >>>>>>>>> > >>>>>>>>> Security is not clear. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> 2. We are not allowed to modify code of existing HBase core > classes > >>>>>> (what > >>>>>>>>> does core mean anyway)? > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> 3. We are not allowed to create backup system table > >>>> (hbase:backup) > >>>>>> in a > >>>>>>>>> system space? Only in user space? The table is global. > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> 2. is critical. Despite the fact, that 95% of code is new, we > >>>> have > >>>>>>>> touched, > >>>>>>>>> of course some existing HBase code. > >>>>>>>>> 3. is not that critical, of course we can move backup system in= to > >>>>>> user > >>>>>>>>> space. > >>>>>>>>> > >>>>>>>>> And finally, will moving backup into external tool give us +1 > >>>> from > >>>>>>> stack? > >>>>>>>>> > >>>>>>>>> -Vlad > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Fri, Sep 23, 2016 at 11:26 AM, Stack > >>>> wrote: > >>>>>>>>> > >>>>>>>>>> On Fri, Sep 23, 2016 at 11:22 AM, Vladimir Rodionov < > >>>>>>>>>> vladrodionov@gmail.com> > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>>>> + MR is dead > >>>>>>>>>>> > >>>>>>>>>>> Does MR know that? :) > >>>>>>>>>>> > >>>>>>>>>>> Again. With all due respect, stack - still no suggestions > >>>> what > >>>>>>> should > >>>>>>>>> we > >>>>>>>>>>> use for "bulk data move and transformation" instead of MR? > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Use whatever distributed engine suits your fancy -- MR, Spark, > >>>>>>>>> distributed > >>>>>>>>>> shell -- just don't have HBase core depend on it, even > >>>>> optionally. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> I suggest voting first on "do we need backup in HBase"? In my > >>>>>>>> opinion, > >>>>>>>>>> some > >>>>>>>>>>> group members still not sure about that and some will give -1 > >>>>>>>>>>> in any case. Just because ... > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> We could run a vote, sure. -1 on that backup be in core hbase > >>>> (+1 > >>>>>> on > >>>>>>>>> adding > >>>>>>>>>> all the API any such external tool might need to run). > >>>>>>>>>> > >>>>>>>>>> St.Ack > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> -Vlad > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Fri, Sep 23, 2016 at 10:57 AM, Stack > >>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> On Fri, Sep 23, 2016 at 6:46 AM, Matteo Bertozzi < > >>>>>>>>>>> theo.bertozzi@gmail.com> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> let me try to go back to my original topic. > >>>>>>>>>>>>> this question was meant to be generic, and provide some > >>>>> rule > >>>>>>> for > >>>>>>>>>> future > >>>>>>>>>>>>> code. > >>>>>>>>>>>>> > >>>>>>>>>>>>> from what I can gather, a rule that may satisfy everyone > >>>>> can > >>>>>>> be: > >>>>>>>>>>>>> - we don't want any core feature (e.g. > >>>>>>> compaction/log-split/log- > >>>>>>>>>>> reply) > >>>>>>>>>>>>> over MR, because some cluster may not want or may have an > >>>>>>>>>>>>> external/uncontrolled MR setup. > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> +1 > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> - we allow non-core features (e.g. features enabled by a > >>>>>> flag) > >>>>>>>> to > >>>>>>>>>> run > >>>>>>>>>>> MR > >>>>>>>>>>>>> jobs from hbase, because unless you use the feature, MR > >>>> is > >>>>>> not > >>>>>>>>>>> required. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> -1 to hbase core depending on MR or core -- whether behind > >>>> a > >>>>>> flag > >>>>>>>> or > >>>>>>>>>> not > >>>>>>>>>>> -- > >>>>>>>>>>>> ever being able to launch MR jobs. > >>>>>>>>>>>> > >>>>>>>>>>>> + MR is dead. We should be busy working hard to undo it > >>>> from > >>>>>>>>>> hbase-server > >>>>>>>>>>>> moving it out to be an optional module (Spark would be its > >>>>>> peer). > >>>>>>>>>>>> + Master is a rats nest of state. Matteo, Stephen, and Appy > >>>>> are > >>>>>>>> busy > >>>>>>>>>>>> working hard on moving it up on to a new foundation. Lets > >>>> not > >>>>>>>> clutter > >>>>>>>>>>> task > >>>>>>>>>>>> harder by piling on more moving parts. > >>>>>>>>>>>> > >>>>>>>>>>>> St.Ack > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> Matteo > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:39 AM, Ted Yu < > >>>>> yuzhihong@gmail.com > >>>>>>> > >>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> I suggest you look at Matteo's work for > >>>> AssignmentManager > >>>>>>> which > >>>>>>>>> is > >>>>>>>>>> to > >>>>>>>>>>>>> make > >>>>>>>>>>>>>> Master more stable. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Cheers > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Fri, Sep 23, 2016 at 5:32 AM, =E5=BC=A0=E9=93=8E < > >>>>> palomino219@gmail.com > >>>>>>> > >>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> No, not your fault, at lease, not this time:) > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Why I call the code ugly? Can you simply tell me the > >>>>>>> sequence > >>>>>>>>> of > >>>>>>>>>>>> calls > >>>>>>>>>>>>>> when > >>>>>>>>>>>>>>> starting up the HMaster? HMaster is also a > >>>> regionserver > >>>>>> so > >>>>>>> it > >>>>>>>>>>> extends > >>>>>>>>>>>>>>> HRegionServer, and the initialization of > >>>> HRegionServer > >>>>>>>>> sometimes > >>>>>>>>>>>> needs > >>>>>>>>>>>>> to > >>>>>>>>>>>>>>> make rpc calls to HMaster. A simple change would > >>>> cause > >>>>>>>>>>> probabilistic > >>>>>>>>>>>>> dead > >>>>>>>>>>>>>>> lock or some strange NPEs... > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> That's why I'm very nervous when somebody wants to > >>>> add > >>>>>> new > >>>>>>>>>> features > >>>>>>>>>>>> or > >>>>>>>>>>>>>> add > >>>>>>>>>>>>>>> external dependencies to HMaster, especially add more > >>>>>> works > >>>>>>>> for > >>>>>>>>>> the > >>>>>>>>>>>>> start > >>>>>>>>>>>>>>> up processing... > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 2016-09-23 20:02 GMT+08:00 Ted Yu < > >>>> yuzhihong@gmail.com > >>>>>> : > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I read through HADOOP-13433 > >>>>>>>>>>>>>>>> >>>> jira/browse/HADOOP-13433> > >>>>> - > >>>>>>> the > >>>>>>>>>> cited > >>>>>>>>>>>>> race > >>>>>>>>>>>>>>>> condition is in jdk. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Suggest pinging the reviewer on JIRA to get it > >>>>> moving. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> bq. But the ugly code in HMaster is readlly a > >>>>>> problem... > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Can you be specific as to which code is ugly ? Is > >>>> it > >>>>> in > >>>>>>> the > >>>>>>>>>>> backup > >>>>>>>>>>>> / > >>>>>>>>>>>>>>>> restore mega patch ? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Cheers > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 10:44 PM, =E5=BC=A0=E9=93=8E < > >>>>>>>> palomino219@gmail.com> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> If you guys have already implemented the feature > >>>> in > >>>>>> the > >>>>>>>> MR > >>>>>>>>>> way > >>>>>>>>>>>> and > >>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>> patch is ready for landing on master, I'm a -0 on > >>>>> it > >>>>>>> as I > >>>>>>>>> do > >>>>>>>>>>> not > >>>>>>>>>>>>> want > >>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>> block the development progress. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> But I strongly suggest later we need to revisit > >>>> the > >>>>>>>> design > >>>>>>>>>> and > >>>>>>>>>>>> see > >>>>>>>>>>>>> if > >>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>> can seperated the logic from HMaster as much as > >>>>>>> possible. > >>>>>>>>> HA > >>>>>>>>>> is > >>>>>>>>>>>>> not a > >>>>>>>>>>>>>>> big > >>>>>>>>>>>>>>>>> problem if you do not store any metada locally. > >>>> But > >>>>>> the > >>>>>>>>> ugly > >>>>>>>>>>> code > >>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>> HMaster is readlly a problem... > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> And for security, I have a issue pending for a > >>>> long > >>>>>>> time. > >>>>>>>>> Can > >>>>>>>>>>>>> someone > >>>>>>>>>>>>>>>> help > >>>>>>>>>>>>>>>>> taking a simple look at it? This is what I mean, > >>>>> ugly > >>>>>>>>> code... > >>>>>>>>>>>>> logout > >>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>> destroy the credentials in a subject when it is > >>>>> still > >>>>>>>> being > >>>>>>>>>>> used, > >>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>> declared as LimitPrivacy so I can not change the > >>>>>>> behivor > >>>>>>>>> and > >>>>>>>>>>> the > >>>>>>>>>>>>> only > >>>>>>>>>>>>>>> way > >>>>>>>>>>>>>>>>> to fix it is to write another piece of ugly > >>>> code... > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> https://issues.apache.org/ > >>>> jira/browse/HADOOP-13433 > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> 2016-09-23 12:53 GMT+08:00 Vladimir Rodionov < > >>>>>>>>>>>>> vladrodionov@gmail.com > >>>>>>>>>>>>>>> : > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> If in the future, we find better ways of > >>>> doing > >>>>>>> this > >>>>>>>>>>> without > >>>>>>>>>>>>>> using > >>>>>>>>>>>>>>>> MR, > >>>>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>> can certainly consider that > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Our framework for distributed operations is > >>>>>> abstract > >>>>>>>> and > >>>>>>>>>>> allows > >>>>>>>>>>>>>>>>>> different implementations. MR is just one > >>>>>>>> implementation > >>>>>>>>> we > >>>>>>>>>>>>>> provide. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> -Vlad > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 9:38 PM, Devaraj Das < > >>>>>>>>>>>>> ddas@hortonworks.com > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Guys, first off apologies for bringing in the > >>>>>> topic > >>>>>>>> of > >>>>>>>>>>>> MR-based > >>>>>>>>>>>>>>>>>>> compactions.. But I was thinking more about > >>>> the > >>>>>>>>>>> SpliceMachine > >>>>>>>>>>>>>>>> approach > >>>>>>>>>>>>>>>>> of > >>>>>>>>>>>>>>>>>>> managing compactions in Spark where > >>>> apparently > >>>>>> they > >>>>>>>>> saw a > >>>>>>>>>>> lot > >>>>>>>>>>>>> of > >>>>>>>>>>>>>>>>>> benefits. > >>>>>>>>>>>>>>>>>>> Apologies for giving you that sore throat > >>>>>> Andrew; I > >>>>>>>>>> really > >>>>>>>>>>>>> didn't > >>>>>>>>>>>>>>>> mean > >>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>> :-) > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> So on this issue, we have these on the plate: > >>>>>>>>>>>>>>>>>>> 0. Somehow not use MR but something like that > >>>>>>>>>>>>>>>>>>> 1. Run a standalone service other than master > >>>>>>>>>>>>>>>>>>> 2. Shell out from the master > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> I don't think we have a good answer to (0), > >>>>> and I > >>>>>>>> don't > >>>>>>>>>>> think > >>>>>>>>>>>>>> it's > >>>>>>>>>>>>>>>> even > >>>>>>>>>>>>>>>>>>> worth the effort of trying to build something > >>>>>> when > >>>>>>> MR > >>>>>>>>> is > >>>>>>>>>>>>> already > >>>>>>>>>>>>>>>> there, > >>>>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>> being used by HBase already for some > >>>>> operations. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> On (1), we have to deal with a myriad of > >>>>> issues - > >>>>>>> HA > >>>>>>>> of > >>>>>>>>>> the > >>>>>>>>>>>>>> server > >>>>>>>>>>>>>>>> not > >>>>>>>>>>>>>>>>>>> being the least of them all. Security > >>>> (kerberos > >>>>>>>>>>>> authentication, > >>>>>>>>>>>>>>>> another > >>>>>>>>>>>>>>>>>>> keytab to manage, etc. etc. etc.). IMO, that > >>>>>>> approach > >>>>>>>>> is > >>>>>>>>>>> DOA. > >>>>>>>>>>>>>>> Instead > >>>>>>>>>>>>>>>>>> let's > >>>>>>>>>>>>>>>>>>> substitute that (1) with the HBase Master. I > >>>>>>> haven't > >>>>>>>>> seen > >>>>>>>>>>> any > >>>>>>>>>>>>>> good > >>>>>>>>>>>>>>>>> reason > >>>>>>>>>>>>>>>>>>> why the HBase master shouldn't launch MR jobs > >>>>> if > >>>>>>>>> needed. > >>>>>>>>>>> It's > >>>>>>>>>>>>> not > >>>>>>>>>>>>>>>>> ideal; > >>>>>>>>>>>>>>>>>>> agreed. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Now before going to (2), let's see what are > >>>> the > >>>>>>>>> benefits > >>>>>>>>>> of > >>>>>>>>>>>>>> running > >>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>> backup/restore jobs from the master. I think > >>>>> Ted > >>>>>>> has > >>>>>>>>>>>> summarized > >>>>>>>>>>>>>>> some > >>>>>>>>>>>>>>>> of > >>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>> issues that we need to take care of - > >>>>> basically, > >>>>>>> the > >>>>>>>>>> master > >>>>>>>>>>>> can > >>>>>>>>>>>>>>> keep > >>>>>>>>>>>>>>>>>> track > >>>>>>>>>>>>>>>>>>> of running jobs, and should it fail, the > >>>> backup > >>>>>>>> master > >>>>>>>>>> can > >>>>>>>>>>>>>> continue > >>>>>>>>>>>>>>>>>> keeping > >>>>>>>>>>>>>>>>>>> track of it (since the jobId would have been > >>>>>>> recorded > >>>>>>>>> in > >>>>>>>>>>> the > >>>>>>>>>>>>> proc > >>>>>>>>>>>>>>>> WAL). > >>>>>>>>>>>>>>>>>> The > >>>>>>>>>>>>>>>>>>> master can also do cleanup, etc. of failed > >>>>>>>>> backup/restore > >>>>>>>>>>>>>>> processes. > >>>>>>>>>>>>>>>>>>> Security is another issue - the job needs to > >>>>> run > >>>>>> as > >>>>>>>>>> 'hbase' > >>>>>>>>>>>>> since > >>>>>>>>>>>>>>> it > >>>>>>>>>>>>>>>>> owns > >>>>>>>>>>>>>>>>>>> the data. Having the master launch the job > >>>>> makes > >>>>>> it > >>>>>>>> get > >>>>>>>>>>> that > >>>>>>>>>>>>>>>> privilege. > >>>>>>>>>>>>>>>>>> In > >>>>>>>>>>>>>>>>>>> the (2) approach, it's hard to do some of the > >>>>>> above > >>>>>>>>>>>> management. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Guys, just to reiterate, the patch as such is > >>>>>> ready > >>>>>>>>> from > >>>>>>>>>>> the > >>>>>>>>>>>>>>> overall > >>>>>>>>>>>>>>>>>>> design/arch point of view (maybe code review > >>>> is > >>>>>>> still > >>>>>>>>>>> pending > >>>>>>>>>>>>>> from > >>>>>>>>>>>>>>>>>> Matteo). > >>>>>>>>>>>>>>>>>>> If in the future, we find better ways of > >>>> doing > >>>>>> this > >>>>>>>>>> without > >>>>>>>>>>>>> using > >>>>>>>>>>>>>>> MR, > >>>>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>>> can certainly consider that. But IMO don't > >>>>> think > >>>>>> we > >>>>>>>>>> should > >>>>>>>>>>>>> block > >>>>>>>>>>>>>>> this > >>>>>>>>>>>>>>>>>> patch > >>>>>>>>>>>>>>>>>>> from getting merged. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> ________________________________________ > >>>>>>>>>>>>>>>>>>> From: =E5=BC=A0=E9=93=8E > >>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22, 2016 8:32 PM > >>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org > >>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs started by > >>>>>> Master > >>>>>>>> or > >>>>>>>>> RS > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> So what about a standalone service other than > >>>>>>> master? > >>>>>>>>> You > >>>>>>>>>>> can > >>>>>>>>>>>>> use > >>>>>>>>>>>>>>>> your > >>>>>>>>>>>>>>>>>> own > >>>>>>>>>>>>>>>>>>> procedure store in that service? > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> 2016-09-23 11:28 GMT+08:00 Ted Yu < > >>>>>>>> yuzhihong@gmail.com > >>>>>>>>>> : > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> An earlier implementation was client > >>>> driven. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> But with that approach, it is hard to > >>>> resume > >>>>> if > >>>>>>>> there > >>>>>>>>>> is > >>>>>>>>>>>>> error > >>>>>>>>>>>>>>>>> midway. > >>>>>>>>>>>>>>>>>>>> Using Procedure V2 makes the backup / > >>>> restore > >>>>>>> more > >>>>>>>>>>> robust. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Another consideration is for security. It > >>>> is > >>>>>> hard > >>>>>>>> to > >>>>>>>>>>>> enforce > >>>>>>>>>>>>>>>> security > >>>>>>>>>>>>>>>>>> (to > >>>>>>>>>>>>>>>>>>>> be implemented) for client driven actions. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Cheers > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 8:15 PM, Andrew > >>>>> Purtell < > >>>>>>>>>>>>>>>>>> andrew.purtell@gmail.com> > >>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> No, this misses Matteo's finer point, > >>>> which > >>>>>> is > >>>>>>>>>>> "shelling > >>>>>>>>>>>>> out" > >>>>>>>>>>>>>>>> from > >>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>> master directly to run MR is a first. Why > >>>> not > >>>>>>> drive > >>>>>>>>>> this > >>>>>>>>>>>>> with a > >>>>>>>>>>>>>>>>> utility > >>>>>>>>>>>>>>>>>>>> derived from Tool? > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 7:57 PM, Vladimir > >>>>>> Rodionov > >>>>>>> < > >>>>>>>>>>>>>>>>>> vladrodionov@gmail.com > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> In our production cluster, it is a > >>>>> common > >>>>>>>> case > >>>>>>>>> we > >>>>>>>>>>>> just > >>>>>>>>>>>>>> have > >>>>>>>>>>>>>>>>> HDFS > >>>>>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>>>>>>> HBase deployed. > >>>>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR > >>>> framework > >>>>>>>>>> (especially > >>>>>>>>>>>> some > >>>>>>>>>>>>>>>>> features > >>>>>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>>>>>>>> have not used at all), it introduced > >>>>>>> another > >>>>>>>>> cost > >>>>>>>>>>> for > >>>>>>>>>>>>>>>> maintain. > >>>>>>>>>>>>>>>>>> I > >>>>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> So , you are not backup users in this > >>>>> case. > >>>>>>> Many > >>>>>>>>> our > >>>>>>>>>>>>>> customers > >>>>>>>>>>>>>>>>> have > >>>>>>>>>>>>>>>>>>> full > >>>>>>>>>>>>>>>>>>>>>> stack deployed and > >>>>>>>>>>>>>>>>>>>>>> want see backup to be a standard > >>>> feature. > >>>>>>>> Besides > >>>>>>>>>>> this, > >>>>>>>>>>>>>>> nothing > >>>>>>>>>>>>>>>>> will > >>>>>>>>>>>>>>>>>>>> happen > >>>>>>>>>>>>>>>>>>>>>> in your cluster > >>>>>>>>>>>>>>>>>>>>>> if you won't be doing backups. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> This discussion (we do not want see M/R > >>>>>>>>> dependency) > >>>>>>>>>>> goes > >>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>> nowhere. > >>>>>>>>>>>>>>>>>>> We > >>>>>>>>>>>>>>>>>>>>>> asked already, at least twice, to > >>>> suggest > >>>>>>>> another > >>>>>>>>>>>>> framework > >>>>>>>>>>>>>>>> (other > >>>>>>>>>>>>>>>>>>> than > >>>>>>>>>>>>>>>>>>>> M/R) > >>>>>>>>>>>>>>>>>>>>>> for bulk data copy with *conversion*. > >>>>> Still > >>>>>>>>> waiting > >>>>>>>>>>> for > >>>>>>>>>>>>>>>>> suggestions. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> -Vlad > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:49 PM, Ted > >>>> Yu < > >>>>>>>>>>>>>> yuzhihong@gmail.com > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> If MR framework is not deployed in the > >>>>>>> cluster, > >>>>>>>>>> hbase > >>>>>>>>>>>>> still > >>>>>>>>>>>>>>>>>> functions > >>>>>>>>>>>>>>>>>>>>>>> normally (post merge). > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> In terms of build time dependency, we > >>>>> have > >>>>>>> long > >>>>>>>>>> been > >>>>>>>>>>>>>>> depending > >>>>>>>>>>>>>>>> on > >>>>>>>>>>>>>>>>>>>>>>> mapreduce. Take a look at > >>>> ExportSnapshot. > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Cheers > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 7:42 PM, Heng > >>>>> Chen > >>>>>> < > >>>>>>>>>>>>>>>>>> heng.chen.1986@gmail.com > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> In our production cluster, it is a > >>>>> common > >>>>>>>> case > >>>>>>>>> we > >>>>>>>>>>>> just > >>>>>>>>>>>>>> have > >>>>>>>>>>>>>>>>> HDFS > >>>>>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>>>>>>> HBase deployed. > >>>>>>>>>>>>>>>>>>>>>>>> If our Master/RS depend on MR > >>>> framework > >>>>>>>>>> (especially > >>>>>>>>>>>> some > >>>>>>>>>>>>>>>>> features > >>>>>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>>>>>>>> have not used at all), it introduced > >>>>>>> another > >>>>>>>>> cost > >>>>>>>>>>> for > >>>>>>>>>>>>>>>> maintain. > >>>>>>>>>>>>>>>>>> I > >>>>>>>>>>>>>>>>>>>>>>>> don't think it is a good idea. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:28 GMT+08:00 =E5=BC=A0=E9=93=8E < > >>>>>>>>>>> palomino219@gmail.com > >>>>>>>>>>>>> : > >>>>>>>>>>>>>>>>>>>>>>>>> To be specific, for example, our nice > >>>>>>>>>>> Backup/Restore > >>>>>>>>>>>>>>> feature, > >>>>>>>>>>>>>>>>> if > >>>>>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>>>>>>> think > >>>>>>>>>>>>>>>>>>>>>>>>> this is not a core feature of HBase, > >>>>> then > >>>>>>> we > >>>>>>>>>> could > >>>>>>>>>>>> make > >>>>>>>>>>>>>> it > >>>>>>>>>>>>>>>>> depend > >>>>>>>>>>>>>>>>>>> on > >>>>>>>>>>>>>>>>>>>>>>> MR, > >>>>>>>>>>>>>>>>>>>>>>>>> and start a standalone BackupManager > >>>>>>> instance > >>>>>>>>>> that > >>>>>>>>>>>>>> submits > >>>>>>>>>>>>>>> MR > >>>>>>>>>>>>>>>>>> jobs > >>>>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>>>>> do > >>>>>>>>>>>>>>>>>>>>>>>>> periodical maintenance job. And if we > >>>>>> think > >>>>>>>>> this > >>>>>>>>>>> is a > >>>>>>>>>>>>>> core > >>>>>>>>>>>>>>>>>> feature > >>>>>>>>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>>>>>>>>>> everyone should use it, then we'd > >>>>> better > >>>>>>>>>> implement > >>>>>>>>>>> it > >>>>>>>>>>>>>>> without > >>>>>>>>>>>>>>>>> MR > >>>>>>>>>>>>>>>>>>>>>>>>> dependency, like DLS. > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> Thanks. > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 10:11 GMT+08:00 =E5=BC=A0=E9=93=8E < > >>>>>>>>>>> palomino219@gmail.com > >>>>>>>>>>>>> : > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> I=E2=80=98m -1 on let master or rs launch MR > >>>>>> jobs. > >>>>>>>> It > >>>>>>>>> is > >>>>>>>>>>> OK > >>>>>>>>>>>>> that > >>>>>>>>>>>>>>>> some > >>>>>>>>>>>>>>>>> of > >>>>>>>>>>>>>>>>>>> our > >>>>>>>>>>>>>>>>>>>>>>>>>> features depend on MR but I think > >>>> the > >>>>>>> bottom > >>>>>>>>>> line > >>>>>>>>>>> is > >>>>>>>>>>>>>> that > >>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>>> should > >>>>>>>>>>>>>>>>>>>>>>>> launch > >>>>>>>>>>>>>>>>>>>>>>>>>> the jobs from outside manually or by > >>>>>> other > >>>>>>>>>>> services. > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> 2016-09-23 9:47 GMT+08:00 Andrew > >>>>>> Purtell < > >>>>>>>>>>>>>>>>>>> andrew.purtell@gmail.com > >>>>>>>>>>>>>>>>>>>>> : > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Ok, got it. Well "shelling out" is > >>>> on > >>>>>> the > >>>>>>>>> line > >>>>>>>>>> I > >>>>>>>>>>>>> think, > >>>>>>>>>>>>>>> so > >>>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>> fair > >>>>>>>>>>>>>>>>>>>>>>>>>>> question. > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Can this be driven by a utility > >>>>> derived > >>>>>>>> from > >>>>>>>>>> Tool > >>>>>>>>>>>>> like > >>>>>>>>>>>>>>> our > >>>>>>>>>>>>>>>>>> other > >>>>>>>>>>>>>>>>>>> MR > >>>>>>>>>>>>>>>>>>>>>>>> apps? > >>>>>>>>>>>>>>>>>>>>>>>>>>> The issue is needing the > >>>>>> AccessController > >>>>>>>> to > >>>>>>>>>>> decide > >>>>>>>>>>>>> if > >>>>>>>>>>>>>>>>> allowed? > >>>>>>>>>>>>>>>>>>> But > >>>>>>>>>>>>>>>>>>>>>>>> nothing > >>>>>>>>>>>>>>>>>>>>>>>>>>> prevents the user from running the > >>>>> job > >>>>>>>>>>>>>>>>> manually/independently, > >>>>>>>>>>>>>>>>>>>> right? > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 3:44 PM, > >>>> Matteo > >>>>>>>>> Bertozzi < > >>>>>>>>>>>>>>>>>>>>>>>> theo.bertozzi@gmail.com> > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> just a remark. my query was not > >>>>> about > >>>>>>>> tools > >>>>>>>>>>> using > >>>>>>>>>>>> MR > >>>>>>>>>>>>>>>>>> (everyone i > >>>>>>>>>>>>>>>>>>>>>>>> think > >>>>>>>>>>>>>>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>>>>>>>>>>>> ok with those). > >>>>>>>>>>>>>>>>>>>>>>>>>>>> the topic was about: "are we ok > >>>> with > >>>>>>>> running > >>>>>>>>>> MR > >>>>>>>>>>>> jobs > >>>>>>>>>>>>>>> from > >>>>>>>>>>>>>>>>>> Master > >>>>>>>>>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>>>>>>> RSs > >>>>>>>>>>>>>>>>>>>>>>>>>>>> code?" since this will be the > >>>> first > >>>>>> time > >>>>>>>> we > >>>>>>>>> do > >>>>>>>>>>>> this > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Matteo > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 2:49 PM, > >>>>>>> Devaraj > >>>>>>>>> Das > >>>>>>>>>> < > >>>>>>>>>>>>>>>>>>>>>>> ddas@hortonworks.com> > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Very much agree; for tools like > >>>>>>>>>> ExportSnapshot > >>>>>>>>>>> / > >>>>>>>>>>>>>>> Backup / > >>>>>>>>>>>>>>>>>>>> Restore, > >>>>>>>>>>>>>>>>>>>>>>>> it's > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> fine to be dependent on MR. MR is > >>>>> the > >>>>>>>> right > >>>>>>>>>>>>> framework > >>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>> such. > >>>>>>>>>>>>>>>>>>>> We > >>>>>>>>>>>>>>>>>>>>>>>>>>> should > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> also do compactions using MR > >>>> (just > >>>>>>> saying > >>>>>>>>> :) > >>>>>>>>>> ) > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ______________________________ > >>>>>>> __________ > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> From: Ted Yu < > >>>> yuzhihong@gmail.com> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sent: Thursday, September 22, > >>>> 2016 > >>>>>> 2:00 > >>>>>>>> PM > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> To: dev@hbase.apache.org > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSSION] MR jobs > >>>>>>> started > >>>>>>>>> by > >>>>>>>>>>>> Master > >>>>>>>>>>>>>> or > >>>>>>>>>>>>>>> RS > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree - backup / restore is in > >>>>> the > >>>>>>> same > >>>>>>>>>>>> category > >>>>>>>>>>>>> as > >>>>>>>>>>>>>>>>> import > >>>>>>>>>>>>>>>>>> / > >>>>>>>>>>>>>>>>>>>>>>>> export. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 22, 2016 at 1:58 PM, > >>>>>> Andrew > >>>>>>>>>>> Purtell < > >>>>>>>>>>>>>>>>>>>>>>>>>>> andrew.purtell@gmail.com> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Backup is extra tooling around > >>>>> core > >>>>>> in > >>>>>>>> my > >>>>>>>>>>>> opinion. > >>>>>>>>>>>>>>> Like > >>>>>>>>>>>>>>>>>> import > >>>>>>>>>>>>>>>>>>>> or > >>>>>>>>>>>>>>>>>>>>>>>>>>> export. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Or the optional MOB tool. It's > >>>>> fine. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sep 22, 2016, at 1:50 PM, > >>>>> Matteo > >>>>>>>>>> Bertozzi > >>>>>>>>>>> < > >>>>>>>>>>>>>>>>>>>>>>>> mbertozzi@apache.org> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> What's the latest opinion > >>>> around > >>>>>>>> running > >>>>>>>>> MR > >>>>>>>>>>>> jobs > >>>>>>>>>>>>>> from > >>>>>>>>>>>>>>>>> hbase > >>>>>>>>>>>>>>>>>>>>>>>> (Master > >>>>>>>>>>>>>>>>>>>>>>>>>>> or > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> RS)? > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I remember in the past that > >>>> there > >>>>>> was > >>>>>>>>>>>> discussion > >>>>>>>>>>>>>>> about > >>>>>>>>>>>>>>>>> not > >>>>>>>>>>>>>>>>>>>>>>> having > >>>>>>>>>>>>>>>>>>>>>>>> MR > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> has > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> direct dependency of hbase. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think some of discussion > >>>> where > >>>>>>> around > >>>>>>>>> MOB > >>>>>>>>>>>> that > >>>>>>>>>>>>>> had > >>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>> MR > >>>>>>>>>>>>>>>>>> job > >>>>>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> compact, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that later was transformed in a > >>>>>>> non-MR > >>>>>>>>> job > >>>>>>>>>> to > >>>>>>>>>>>> be > >>>>>>>>>>>>>>>> merged, > >>>>>>>>>>>>>>>>> I > >>>>>>>>>>>>>>>>>>>> think > >>>>>>>>>>>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> had a > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> similar discussion for log > >>>>>>>> split/replay. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the latest is the new Backup > >>>>>> feature > >>>>>>>>>>>>> (HBASE-7912), > >>>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>>> runs > >>>>>>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>>>>>>> MR > >>>>>>>>>>>>>>>>>>>>>>>> job > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the master to copy data or > >>>>> restore > >>>>>>>> data. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (backup is also "not really > >>>> core" > >>>>>> as > >>>>>>>> in.. > >>>>>>>>>> if > >>>>>>>>>>>> you > >>>>>>>>>>>>>>> don't > >>>>>>>>>>>>>>>>> use > >>>>>>>>>>>>>>>>>>>>>>> backup > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> you'll > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> not end up running MR jobs, but > >>>>>> this > >>>>>>>> was > >>>>>>>>>>>> probably > >>>>>>>>>>>>>>> true > >>>>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>>> MOB > >>>>>>>>>>>>>>>>>>>>>>> as > >>>>>>>>>>>>>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> "if > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you don't enable MOB you don't > >>>>> need > >>>>>>>> MR") > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> any thoughts? do we a rule that > >>>>>> says > >>>>>>>> "we > >>>>>>>>>>> don't > >>>>>>>>>>>>> want > >>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>> have > >>>>>>>>>>>>>>>>>>>>>>> hbase > >>>>>>>>>>>>>>>>>>>>>>>> run > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> MR > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> jobs, only tool started > >>>> manually > >>>>> by > >>>>>>> the > >>>>>>>>>> user > >>>>>>>>>>>> can > >>>>>>>>>>>>> do > >>>>>>>>>>>>>>>>> that". > >>>>>>>>>>>>>>>>>> or > >>>>>>>>>>>>>>>>>>>>>>> can > >>>>>>>>>>>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> start > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> adding MR calls around without > >>>>>>>> problems? > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >> > --94eb2c075fe891093e053d44f16a--