gobblin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vicky Kak <vicky....@gmail.com>
Subject Re: Gobblin As Service Questions
Date Wed, 26 Jul 2017 12:45:12 GMT

I did spend more time looking at the code details and have following to

I do see that GobblinServiceManager( this is bootstrap class for the
gobblin service) performing these
1) Initialising the
TopologyCatalog,FlowCatalog,Helix,ServiceScheduler,EmbeddedLiServer and
finally Orchestator/TopologySpecFactory.
2) The FlowConfigClient seems to creating the FlowConfig, then FlowSpec via
FlowConfigResource ( via RestEndpoint).
3) The JobSpec gets added to the FlowCatalog after which the Orchestrator
pushes the JobSpec to the Kafka via SimpleKafkaStepExecutionProducer.

I have been looking for a code which will use the
SimpleKafkaStepExecutionConsumer,  but could not find how it is hooked with
the running instance of the Gobblin.

Here is how the gobblin service will invoke the Jobs on slaves( gobblin

1) We should have the rest endpoint information so that we can send the
JobSpec via FlowConfigClient or via the HTTP GET( rest call, I have not yet
tried this). I don't see a way to get the port when the rest server is
2) The JobSpec is passed to the Kafka via the
SimpleKafkaStepExecutionProducer from the gobblin service via Orchestrator.
3) There could be multiple instances of the Gobblin which could be
listening to the Kafka using the SimpleKafkaStepExecutionConsumer, all the
Gobblin instance should get the JobSpecs. The one instance which matches
the job specs should trigger the Job.

The Gobblin service acts as a master and provides the rest endpoint to
read/create the JobSpecs which will get triggered on the slaves( which are
the Gobblin instances).
I have yet not been able to run the flow since there are some build issues
I am getting via building the gobblin from the master, the tests are
failing right now.

Can someone from the development team validate if I am on right tract in
terms of understanding the implementation and flows?

I have got more questions which I will post after I confirm that I am not
missing anything.


On Tue, Jul 25, 2017 at 5:03 PM, Vicky Kak <vicky.kak@gmail.com> wrote:

> To my surprise after I looked at the code and referred the presentation
> that Shrishanka had send my ignorance about Gobblin As A Service was removed
> Gobblin As a service : It is a Global Orchestrator which helps in
> submitting the logical flow specifications which are further compiled to
> the physical pipelines.
> We have been triggering the Gobblin Jobs using the RestEnd point and it is
> done by implementing the custom service as explained here
> https://groups.google.com/forum/#!topic/gobblin-users/kHrWh6lfGJM
> I have got the following questions
> 1) What is the use case for Gobblin As service, I don't see the
> Orchestrator's rest endpoint port being configurable. If we have to add
> FlowSpec using the different machine we need to know the Orchestrator's
> host and port details, how do we do it?
> 2) Does FlowSpec creation creates a new Job deployment which can also by
> copying the corresponding .pull or .job file in the gobblin distribution?
> 3) Since the master.out log gets created when starting a service, I assume
> there could be a way to add more Orchestrators to the master that is
> started. However I am not sure how to do that, can this be clarified?
> Please note that I have been looking at the older code, the git log is
> follow.
> ************************************************************
> ***********************************
> commit 755da9160cd91ea5ebcc752603ce1bffb74a75a1 (HEAD -> master,
> origin/master, origin/HEAD)
> Author: Kuai Yu <yukuai518@gmail.com>
> Date:   Tue Apr 11 19:10:53 2017 -0700
> ************************************************************
> ***********************************
> Thanks,
> Vicky

View raw message