Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id EFBAD200CD3 for ; Fri, 28 Jul 2017 14:23:11 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id EDF1516CB56; Fri, 28 Jul 2017 12:23:11 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 98FDF16CB51 for ; Fri, 28 Jul 2017 14:23:10 +0200 (CEST) Received: (qmail 82890 invoked by uid 500); 28 Jul 2017 12:23:09 -0000 Mailing-List: contact user-help@gobblin.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@gobblin.incubator.apache.org Delivered-To: mailing list user@gobblin.incubator.apache.org Received: (qmail 82878 invoked by uid 99); 28 Jul 2017 12:23:09 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Jul 2017 12:23:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 4D1FEC3105 for ; Fri, 28 Jul 2017 12:23:09 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.379 X-Spam-Level: ** X-Spam-Status: No, score=2.379 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id SZ9-JeNWIYxg for ; Fri, 28 Jul 2017 12:23:07 +0000 (UTC) Received: from mail-yw0-f179.google.com (mail-yw0-f179.google.com [209.85.161.179]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 62D345FC1C for ; Fri, 28 Jul 2017 12:23:07 +0000 (UTC) Received: by mail-yw0-f179.google.com with SMTP id l82so64435962ywc.2 for ; Fri, 28 Jul 2017 05:23:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=GPOLcyIDgGD9qvvWejjgvGGrEimT2JvjXRGfOYAZzOY=; b=j8ErXrj3ISO5sJNOExpM4TIrPiX6JzwgFijHD65bP59hSQul3VJvaoXgUeORmKMrgt n3spTy1Jqe+0kPwxUOL7W4QZggMWrNRGYQfVMp84ZgB7Cbs4ppvvtxc/TDWB1/BFzPDb 3d9Imna5Pz7Yl1LD3bDQjZqjHIfbgIAT4Vm+HjBCjg4tXycbOPQ+z7D2oCjOaXXdQQh+ LAYPoS0aDCkSkX+VY/vMTE/0ypmWyzalk03R34/OYwNuLqwyrqrjFUppetH5szhrTHPs vMsDRQU7tkjLr+yJbfB2Ci22Jgqru8Gw0qSC8DUjDl3xoAzoSyiGJQB6+h7RVTDMdOq0 TmxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=GPOLcyIDgGD9qvvWejjgvGGrEimT2JvjXRGfOYAZzOY=; b=HM5Eby+y/FtvS53dMV/74N0eeSuvBdDg392U8K6MD2M+bfvfs0nWNtTV7J4WGafEtQ 27IqNQ/yfxjbvMERuRurb5j2HmowoTpySyxajICDKdJ7ERFdzdvQRbGYi4AJQWeWRdR2 OZZ61qico8y18FJeiGEvM4r92ngedTLgZHzL/2CsdPRArJ24tYnK+3a3Yb/Lc19gM3CS 7PGv0243UGOv3GQX7v00acgnNnHrb9muxcKF312r0b2q2J5dPSD39SMFg6ahIX1VbCpr fKS8vVE7arbR7m9XsaAZ3TeVf4alp26phrDMIm/TXej3Asyw/hd6Uf/dzRB7Tv/s3aBZ 7QWQ== X-Gm-Message-State: AIVw1112nfWPh/dVz2JIkhK2O2MOB5IlA8GdbGL+AGYDkgmm1ve3nmBK k/3xXHEguGlrisMoRhqrQMGfD7+WaujZ X-Received: by 10.37.14.4 with SMTP id 4mr6145081ybo.254.1501244581114; Fri, 28 Jul 2017 05:23:01 -0700 (PDT) MIME-Version: 1.0 Received: by 10.13.205.5 with HTTP; Fri, 28 Jul 2017 05:23:00 -0700 (PDT) In-Reply-To: References: From: Vicky Kak Date: Fri, 28 Jul 2017 17:53:00 +0530 Message-ID: Subject: Re: Gobblin As Service Questions To: Abhishek Tiwari Cc: user@gobblin.incubator.apache.org Content-Type: multipart/alternative; boundary="001a113e7f84faa84505555fbd07" archived-at: Fri, 28 Jul 2017 12:23:12 -0000 --001a113e7f84faa84505555fbd07 Content-Type: text/plain; charset="UTF-8" Hi Abhishek, Some of the review points after going through the wiki 1) There is no component available by the name of "FlowManager", it seems the FlowManager is basically the FlowConfigsResource+RestLi handling the user invocation. 2) There is not explicit mention of the trigerring of the existing Flow, it seems to be triggered via the POST call as mentoned in the documentation as curli http://localhost:8080/flowconfigs -X POST -H 'X-RestLi-Method: create' -H 'X-RestLi-Protocol-Version: 2.0.0' --data '{"flowName" : "myflow1", "flowGroup" : "mygroup", "templateNames" : "FS:///mytemplate.template", "schedule" : "", "properties" : {"prop1" : "value1"}}' 3) You can see the type in the wiki in 2, check the curli part. 4) I am not able to see the code related Monitoring being present in the GobblinServiceManager, where is the monitoring piece present? 5) The Appendix section contains the reference to the Components which seems not be present like SimpleRESTSpecExecutor,OrchestratorModule( module name should be removed) and many more are possible. Also I am not able to search for GobblinRestFlowMonitor etc.. I have got build erros in the Eclipse may be that is the reason I am not able to see these classes. Also I see the the GAAS sending the Jobs to the SpecExecutorInstance via Kafka/git etc however I am yet not able to find how the SpecExecutorInstance is configured in the Gobblin Instances where the Jobs should be constructed and triggered. How and where do we configure the SpecExecutorIntance for the Gobblins Instances for which the Jobs can be configured/triggered via GAAS? Thanks, Vicky On Fri, Jul 28, 2017 at 9:07 AM, Vicky Kak wrote: > I can see the images now. > > Thanks, > Vicky > > On Fri, Jul 28, 2017 at 9:05 AM, Abhishek Tiwari < > abhishektiwari.btech@gmail.com> wrote: > >> Hi Vicky, >> >> I have fixed the images, please check again. >> >> Regards, >> Abhishek >> >> >> On Thu, Jul 27, 2017 at 8:20 PM, Vicky Kak wrote: >> >>> Thanks Abhishek for the confirmation. >>> >>> I am not able to see the images in the GAAS wiki, the images seems to be >>> coming from the google docs and I could make that my id does not have >>> access. May be making he images public would help, can you please check why >>> I am not able to see the images in the wiki? >>> >>> Regards, >>> Vicky >>> >>> >>> >>> >>> >>> On Thu, Jul 27, 2017 at 7:41 PM, Abhishek Tiwari >>> wrote: >>> >>>> Hi Vicky, >>>> >>>> My responses are inlined in blue. You are on right track. >>>> >>>> Also the design doc of Gobblin as a Service for your reference: >>>> https://cwiki.apache.org/confluence/display/GOBBL >>>> IN/Gobblin+as+a+Service >>>> >>>> Regards, >>>> Abhishek >>>> >>>> On Wed, Jul 26, 2017 at 5:45 AM, Vicky Kak wrote: >>>> >>>>> Hi, >>>>> >>>>> I did spend more time looking at the code details and have following >>>>> to share. >>>>> >>>>> I do see that GobblinServiceManager( this is bootstrap class for the >>>>> gobblin service) performing these >>>>> 1) Initialising the TopologyCatalog,FlowCatalog,He >>>>> lix,ServiceScheduler,EmbeddedLiServer and finally >>>>> Orchestator/TopologySpecFactory. >>>>> 2) The FlowConfigClient seems to creating the FlowConfig, then >>>>> FlowSpec via FlowConfigResource ( via RestEndpoint). >>>>> 3) The JobSpec gets added to the FlowCatalog after which the >>>>> Orchestrator pushes the JobSpec to the Kafka via >>>>> SimpleKafkaStepExecutionProducer. >>>>> >>>>> I have been looking for a code which will use the >>>>> SimpleKafkaStepExecutionConsumer, but could not find how it is >>>>> hooked with the running instance of the Gobblin. >>>>> >>>> Look at gobblin-cluster and default config for classes being loaded for >>>> listeners, JobConfigurationManager, etc. >>>> >>>> >>>>> >>>>> Here is how the gobblin service will invoke the Jobs on slaves( >>>>> gobblin instances) >>>>> >>>>> 1) We should have the rest endpoint information so that we can send >>>>> the JobSpec via FlowConfigClient or via the HTTP GET( rest call, I have not >>>>> yet tried this). I don't see a way to get the port when the rest server is >>>>> started. >>>>> >>>> We should make it configurable, right now it chooses random port. >>>> >>>> >>>>> 2) The JobSpec is passed to the Kafka via the >>>>> SimpleKafkaStepExecutionProducer from the gobblin service via >>>>> Orchestrator. >>>>> 3) There could be multiple instances of the Gobblin which could be >>>>> listening to the Kafka using the SimpleKafkaStepExecutionConsumer, >>>>> all the Gobblin instance should get the JobSpecs. The one instance which >>>>> matches the job specs should trigger the Job. >>>>> >>>> Yes, we can make this a bit less ambiguous though. >>>> >>>> >>>>> >>>>> The Gobblin service acts as a master and provides the rest endpoint to >>>>> read/create the JobSpecs which will get triggered on the slaves( which are >>>>> the Gobblin instances). >>>>> I have yet not been able to run the flow since there are some build >>>>> issues I am getting via building the gobblin from the master, the tests are >>>>> failing right now. >>>>> >>>>> Can someone from the development team validate if I am on right tract >>>>> in terms of understanding the implementation and flows? >>>>> >>>> You are on right track. >>>> >>>>> >>>>> I have got more questions which I will post after I confirm that I am >>>>> not missing anything. >>>>> >>>>> Thanks, >>>>> Vicky >>>>> >>>>> On Tue, Jul 25, 2017 at 5:03 PM, Vicky Kak >>>>> wrote: >>>>> >>>>>> To my surprise after I looked at the code and referred the >>>>>> presentation that Shrishanka had send my ignorance about Gobblin As A >>>>>> Service was removed >>>>>> >>>>>> Gobblin As a service : It is a Global Orchestrator which helps in >>>>>> submitting the logical flow specifications which are further compiled to >>>>>> the physical pipelines. >>>>>> >>>>>> We have been triggering the Gobblin Jobs using the RestEnd point and >>>>>> it is done by implementing the custom service as explained here >>>>>> https://groups.google.com/forum/#!topic/gobblin-users/kHrWh6lfGJM >>>>>> >>>>>> I have got the following questions >>>>>> >>>>>> 1) What is the use case for Gobblin As service, I don't see the >>>>>> Orchestrator's rest endpoint port being configurable. If we have to add >>>>>> FlowSpec using the different machine we need to know the Orchestrator's >>>>>> host and port details, how do we do it? >>>>>> >>>>> We use d2 registry internally for it (if you dont already know about >>>> it - search for RESTLI D2) >>>> >>>> >>>>> >>>>>> 2) Does FlowSpec creation creates a new Job deployment which can also >>>>>> by copying the corresponding .pull or .job file in the gobblin distribution? >>>>>> >>>>> If you are saying that if you bundle a pull file in gobblin >>>> distribution and create the same via FlowSpec would it mean the same thing, >>>> then yes. Else I didnt understand the question. >>>> >>>> >>>>> >>>>>> 3) Since the master.out log gets created when starting a service, I >>>>>> assume there could be a way to add more Orchestrators to the master that is >>>>>> started. However I am not sure how to do that, can this be clarified? >>>>>> >>>>> Only one node acts as orchestrator and scheduler. Rest of the nodes >>>> receive requests and pass them to master for scheduling and orchestrating >>>> via Helix messages. >>>> >>>> >>>>> >>>>>> Please note that I have been looking at the older code, the git log >>>>>> is follow. >>>>>> ************************************************************ >>>>>> *********************************** >>>>>> commit 755da9160cd91ea5ebcc752603ce1bffb74a75a1 (HEAD -> master, >>>>>> origin/master, origin/HEAD) >>>>>> Author: Kuai Yu >>>>>> Date: Tue Apr 11 19:10:53 2017 -0700 >>>>>> ************************************************************ >>>>>> *********************************** >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Vicky >>>>>> >>>>> >>>>> >>>> >>> >> > --001a113e7f84faa84505555fbd07 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Abhishek,

Some of the rev= iew points after going through the wiki

1) There i= s no component available by the name of "FlowManager", it seems t= he FlowManager is basically the FlowConfigsResource+RestLi handling the use= r invocation.

2) There is not explicit mention of = the trigerring of the existing Flow, it seems to be triggered via the POST = call as mentoned in the documentation as=C2=A0

cur= li http://localhost:8080/flow= configs -X POST -H 'X-RestLi-Method: create' -H 'X-RestLi-P= rotocol-Version: 2.0.0' --data '{"flowName" : "myflo= w1", "flowGroup" : "mygroup", "templateNames&= quot; : "FS:///mytemplate.template", "schedule" : "= ;", "properties" : {"prop1" : "value1"}}= '


3) You can see the type in th= e wiki in 2, check the curli part.

4) I am not abl= e to see the code related Monitoring being present in the GobblinServiceMan= ager, where is the monitoring piece present?

5) Th= e Appendix section contains the reference to the Components which seems not= be present like SimpleRESTSpecExecutor,OrchestratorModule( module name sho= uld be removed) and many more are possible. Also I am not able to search fo= r GobblinRestFlowMonitor etc.. I have got build erros in the Eclipse may be= that is the reason I am not able to see these classes.

Also I see the the GAAS sending the Jobs to the SpecExecutorInstance = via Kafka/git etc however I am yet not able to find how the SpecExecutorIns= tance is configured in the Gobblin Instances where the Jobs should be const= ructed and triggered. How and where do we configure the SpecExecutorIntance= for the Gobblins Instances for which the Jobs can be configured/triggered = via GAAS?


Thanks,
Vicky

On Fri,= Jul 28, 2017 at 9:07 AM, Vicky Kak <vicky.kak@gmail.com> = wrote:
I can see the ima= ges now.

Thanks,
Vicky

On Fri, Jul 28, 2017 at 9:05 AM, Abhishek Tiwari <abhishektiwari.btech@gmail.com> wrote:
Hi Vicky,=C2=A0

I ha= ve fixed the images, please check again.=C2=A0

Reg= ards,=C2=A0
Abhishek


On Th= u, Jul 27, 2017 at 8:20 PM, Vicky Kak <vicky.kak@gmail.com> wrote:
Thanks Abhishek= for the confirmation.

I am not able to see the images i= n the GAAS wiki, the images seems to be coming from the google docs and I c= ould make that my id does not have access. May be making he images public w= ould help, can you please check why I am not able to see the images in the = wiki?

Regards,
Vicky





On Thu, Jul 27, 2017 at 7:41 PM, Abhishek Tiwari <abti@apache.o= rg> wrote:
Hi Vicky,=C2=A0

My responses are inlined in blue. You are on right track.=C2=A0

Also the design doc of Gobblin as a Service for your reference:=C2=A0= https://cwiki.apache.org/confluence/displa= y/GOBBLIN/Gobblin+as+a+Service=C2=A0

Rega= rds,=C2=A0
Abhishek

On Wed, Jul 26, 2017 at 5:45 AM, Vicky Kak <vicky= .kak@gmail.com> wrote:
Hi,

I did spend more tim= e looking at the code details and have following to share.

I do see that GobblinServiceManager( this is bootstrap class for t= he gobblin service) performing these=C2=A0
1) Initialising the To= pologyCatalog,FlowCatalog,Helix,ServiceScheduler,EmbeddedLiServer= and finally Orchestator/TopologySpecFactory.
2) The FlowCon= figClient seems to creating the FlowConfig, then FlowSpec via FlowConfigRes= ource ( via RestEndpoint).
3) The JobSpec gets added to the FlowC= atalog after which the Orchestrator pushes the JobSpec to the Kafka via Sim= pleKafkaStepExecutionProducer.

I have been lo= oking for a code which will use the SimpleKafkaStepExecutionConsumer, = =C2=A0but could not find how it is hooked with the running instance of the = Gobblin.
Look a= t gobblin-cluster and default config for classes being loaded for listeners= , JobConfigurationManager, etc.
=C2=A0

<= div>Here is how the gobblin service will invoke the Jobs on slaves( gobblin= instances)

1) We should have the rest endpoint in= formation so that we can send the JobSpec via FlowConfigClient or via the H= TTP GET( rest call, I have not yet tried this). I don't see a way to ge= t the port when the rest server is started.
=
We should make it configurable, right now it c= hooses random port.=C2=A0
=C2=A0
2) The JobSpec is= passed to the Kafka via the SimpleKafkaStepExecutionProducer from the= gobblin service via Orchestrator.
3) There could be multiple ins= tances of the Gobblin which could be listening to the Kafka using the Simpl= eKafkaStepExecutionConsumer, all the Gobblin instance should get the J= obSpecs. The one instance which matches the job specs should trigger the Jo= b.
Yes, we can = make this a bit less ambiguous though.=C2=A0
=C2=A0<= /div>

The Gobblin service acts as a master and provides the rest= endpoint to read/create the JobSpecs which will get triggered on the slave= s( which are the Gobblin instances).=C2=A0
I have yet not been ab= le to run the flow since there are some build issues I am getting via build= ing the gobblin from the master, the tests are failing right now.

Can someone from the development team validate if I am on r= ight tract in terms of understanding the implementation and flows?
You are on right track.= =C2=A0
=

I have got more questions which I will= post after I confirm that I am not missing anything.

<= div>Thanks,
Vicky

On Tue, Jul 25, 2017 at 5:03 PM, Vicky Kak <vic= ky.kak@gmail.com> wrote:
To my surprise after I looked at the= code and referred the presentation that Shrishanka had send my ignorance a= bout Gobblin As A Service was removed

Gobblin As a= service : It is a Global Orchestrator which helps in submitting the logica= l flow specifications which are further compiled to the physical pipelines.=

We have been triggering the Gobblin Jobs using th= e RestEnd point and it is done by implementing the custom service as explai= ned here

I hav= e got the following questions

1) What is the use c= ase for Gobblin As service, I don't see the Orchestrator's rest end= point port being configurable. If we have to add FlowSpec using the differe= nt machine we need to know the Orchestrator's host and port details, ho= w do we do it?
We use d2 registry internally for it (= if you dont already know about it - search for RESTLI D2)
=C2=A0

2) Does FlowS= pec creation creates a new Job deployment which can also by copying the cor= responding .pull or .job file in the gobblin distribution?
If you are saying that if you bundle a pull file in gobblin distribut= ion and create the same via FlowSpec would it mean the same thing, then yes= . Else I didnt understand the question.=C2=A0
=C2=A0=

3) Since the master.out l= og gets created when starting a service, I assume there could be a way to a= dd more Orchestrators to the master that is started. However I am not sure = how to do that, can this be clarified?
=
Only one node = acts as orchestrator and scheduler. Rest of the nodes receive requests and = pass them to master for scheduling and orchestrating via Helix messages.=C2= =A0
=C2=A0

Please note that I have been looking at the older code, the git log= is follow.
************************************************= ***********************************************
commit = 755da9160cd91ea5ebcc752603ce1bffb74a75a1 (HEAD -> master, origin/ma= ster, origin/HEAD)
Author: Kuai Yu <yukuai518@gmail.com>
Date: = =C2=A0 Tue Apr 11 19:10:53 2017 -0700
***************************= ********************************************************************


Thanks,
Vicky






--001a113e7f84faa84505555fbd07--