gobblin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vicky Kak <vicky....@gmail.com>
Subject Re: Gobblin As Service Questions
Date Fri, 28 Jul 2017 03:20:36 GMT
Thanks Abhishek for the confirmation.

I am not able to see the images in the GAAS wiki, the images seems to be
coming from the google docs and I could make that my id does not have
access. May be making he images public would help, can you please check why
I am not able to see the images in the wiki?

Regards,
Vicky





On Thu, Jul 27, 2017 at 7:41 PM, Abhishek Tiwari <abti@apache.org> wrote:

> Hi Vicky,
>
> My responses are inlined in blue. You are on right track.
>
> Also the design doc of Gobblin as a Service for your reference:
> https://cwiki.apache.org/confluence/display/GOBBLIN/Gobblin+as+a+Service
>
> Regards,
> Abhishek
>
> On Wed, Jul 26, 2017 at 5:45 AM, Vicky Kak <vicky.kak@gmail.com> wrote:
>
>> Hi,
>>
>> I did spend more time looking at the code details and have following to
>> share.
>>
>> I do see that GobblinServiceManager( this is bootstrap class for the
>> gobblin service) performing these
>> 1) Initialising the TopologyCatalog,FlowCatalog,He
>> lix,ServiceScheduler,EmbeddedLiServer and finally
>> Orchestator/TopologySpecFactory.
>> 2) The FlowConfigClient seems to creating the FlowConfig, then FlowSpec
>> via FlowConfigResource ( via RestEndpoint).
>> 3) The JobSpec gets added to the FlowCatalog after which the Orchestrator
>> pushes the JobSpec to the Kafka via SimpleKafkaStepExecutionProducer.
>>
>> I have been looking for a code which will use the
>> SimpleKafkaStepExecutionConsumer,  but could not find how it is hooked
>> with the running instance of the Gobblin.
>>
> Look at gobblin-cluster and default config for classes being loaded for
> listeners, JobConfigurationManager, etc.
>
>
>>
>> Here is how the gobblin service will invoke the Jobs on slaves( gobblin
>> instances)
>>
>> 1) We should have the rest endpoint information so that we can send the
>> JobSpec via FlowConfigClient or via the HTTP GET( rest call, I have not yet
>> tried this). I don't see a way to get the port when the rest server is
>> started.
>>
> We should make it configurable, right now it chooses random port.
>
>
>> 2) The JobSpec is passed to the Kafka via the
>> SimpleKafkaStepExecutionProducer from the gobblin service via
>> Orchestrator.
>> 3) There could be multiple instances of the Gobblin which could be
>> listening to the Kafka using the SimpleKafkaStepExecutionConsumer, all
>> the Gobblin instance should get the JobSpecs. The one instance which
>> matches the job specs should trigger the Job.
>>
> Yes, we can make this a bit less ambiguous though.
>
>
>>
>> The Gobblin service acts as a master and provides the rest endpoint to
>> read/create the JobSpecs which will get triggered on the slaves( which are
>> the Gobblin instances).
>> I have yet not been able to run the flow since there are some build
>> issues I am getting via building the gobblin from the master, the tests are
>> failing right now.
>>
>> Can someone from the development team validate if I am on right tract in
>> terms of understanding the implementation and flows?
>>
> You are on right track.
>
>>
>> I have got more questions which I will post after I confirm that I am not
>> missing anything.
>>
>> Thanks,
>> Vicky
>>
>> On Tue, Jul 25, 2017 at 5:03 PM, Vicky Kak <vicky.kak@gmail.com> wrote:
>>
>>> To my surprise after I looked at the code and referred the presentation
>>> that Shrishanka had send my ignorance about Gobblin As A Service was removed
>>>
>>> Gobblin As a service : It is a Global Orchestrator which helps in
>>> submitting the logical flow specifications which are further compiled to
>>> the physical pipelines.
>>>
>>> We have been triggering the Gobblin Jobs using the RestEnd point and it
>>> is done by implementing the custom service as explained here
>>> https://groups.google.com/forum/#!topic/gobblin-users/kHrWh6lfGJM
>>>
>>> I have got the following questions
>>>
>>> 1) What is the use case for Gobblin As service, I don't see the
>>> Orchestrator's rest endpoint port being configurable. If we have to add
>>> FlowSpec using the different machine we need to know the Orchestrator's
>>> host and port details, how do we do it?
>>>
>> We use d2 registry internally for it (if you dont already know about it -
> search for RESTLI D2)
>
>
>>
>>> 2) Does FlowSpec creation creates a new Job deployment which can also by
>>> copying the corresponding .pull or .job file in the gobblin distribution?
>>>
>> If you are saying that if you bundle a pull file in gobblin distribution
> and create the same via FlowSpec would it mean the same thing, then yes.
> Else I didnt understand the question.
>
>
>>
>>> 3) Since the master.out log gets created when starting a service, I
>>> assume there could be a way to add more Orchestrators to the master that is
>>> started. However I am not sure how to do that, can this be clarified?
>>>
>> Only one node acts as orchestrator and scheduler. Rest of the nodes
> receive requests and pass them to master for scheduling and orchestrating
> via Helix messages.
>
>
>>
>>> Please note that I have been looking at the older code, the git log is
>>> follow.
>>> ************************************************************
>>> ***********************************
>>> commit 755da9160cd91ea5ebcc752603ce1bffb74a75a1 (HEAD -> master,
>>> origin/master, origin/HEAD)
>>> Author: Kuai Yu <yukuai518@gmail.com>
>>> Date:   Tue Apr 11 19:10:53 2017 -0700
>>> ************************************************************
>>> ***********************************
>>>
>>>
>>> Thanks,
>>> Vicky
>>>
>>
>>
>

Mime
View raw message