gobblin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek Tiwari <abhishektiwari.bt...@gmail.com>
Subject Re: Gobblin As Service Questions
Date Fri, 28 Jul 2017 03:35:43 GMT
Hi Vicky,

I have fixed the images, please check again.

Regards,
Abhishek

On Thu, Jul 27, 2017 at 8:20 PM, Vicky Kak <vicky.kak@gmail.com> wrote:

> Thanks Abhishek for the confirmation.
>
> I am not able to see the images in the GAAS wiki, the images seems to be
> coming from the google docs and I could make that my id does not have
> access. May be making he images public would help, can you please check why
> I am not able to see the images in the wiki?
>
> Regards,
> Vicky
>
>
>
>
>
> On Thu, Jul 27, 2017 at 7:41 PM, Abhishek Tiwari <abti@apache.org> wrote:
>
>> Hi Vicky,
>>
>> My responses are inlined in blue. You are on right track.
>>
>> Also the design doc of Gobblin as a Service for your reference:
>> https://cwiki.apache.org/confluence/display/GOBBLIN/Gobblin+as+a+Service
>>
>> Regards,
>> Abhishek
>>
>> On Wed, Jul 26, 2017 at 5:45 AM, Vicky Kak <vicky.kak@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I did spend more time looking at the code details and have following to
>>> share.
>>>
>>> I do see that GobblinServiceManager( this is bootstrap class for the
>>> gobblin service) performing these
>>> 1) Initialising the TopologyCatalog,FlowCatalog,He
>>> lix,ServiceScheduler,EmbeddedLiServer and finally
>>> Orchestator/TopologySpecFactory.
>>> 2) The FlowConfigClient seems to creating the FlowConfig, then FlowSpec
>>> via FlowConfigResource ( via RestEndpoint).
>>> 3) The JobSpec gets added to the FlowCatalog after which the
>>> Orchestrator pushes the JobSpec to the Kafka via
>>> SimpleKafkaStepExecutionProducer.
>>>
>>> I have been looking for a code which will use the
>>> SimpleKafkaStepExecutionConsumer,  but could not find how it is hooked
>>> with the running instance of the Gobblin.
>>>
>> Look at gobblin-cluster and default config for classes being loaded for
>> listeners, JobConfigurationManager, etc.
>>
>>
>>>
>>> Here is how the gobblin service will invoke the Jobs on slaves( gobblin
>>> instances)
>>>
>>> 1) We should have the rest endpoint information so that we can send the
>>> JobSpec via FlowConfigClient or via the HTTP GET( rest call, I have not yet
>>> tried this). I don't see a way to get the port when the rest server is
>>> started.
>>>
>> We should make it configurable, right now it chooses random port.
>>
>>
>>> 2) The JobSpec is passed to the Kafka via the
>>> SimpleKafkaStepExecutionProducer from the gobblin service via
>>> Orchestrator.
>>> 3) There could be multiple instances of the Gobblin which could be
>>> listening to the Kafka using the SimpleKafkaStepExecutionConsumer, all
>>> the Gobblin instance should get the JobSpecs. The one instance which
>>> matches the job specs should trigger the Job.
>>>
>> Yes, we can make this a bit less ambiguous though.
>>
>>
>>>
>>> The Gobblin service acts as a master and provides the rest endpoint to
>>> read/create the JobSpecs which will get triggered on the slaves( which are
>>> the Gobblin instances).
>>> I have yet not been able to run the flow since there are some build
>>> issues I am getting via building the gobblin from the master, the tests are
>>> failing right now.
>>>
>>> Can someone from the development team validate if I am on right tract in
>>> terms of understanding the implementation and flows?
>>>
>> You are on right track.
>>
>>>
>>> I have got more questions which I will post after I confirm that I am
>>> not missing anything.
>>>
>>> Thanks,
>>> Vicky
>>>
>>> On Tue, Jul 25, 2017 at 5:03 PM, Vicky Kak <vicky.kak@gmail.com> wrote:
>>>
>>>> To my surprise after I looked at the code and referred the presentation
>>>> that Shrishanka had send my ignorance about Gobblin As A Service was removed
>>>>
>>>> Gobblin As a service : It is a Global Orchestrator which helps in
>>>> submitting the logical flow specifications which are further compiled to
>>>> the physical pipelines.
>>>>
>>>> We have been triggering the Gobblin Jobs using the RestEnd point and it
>>>> is done by implementing the custom service as explained here
>>>> https://groups.google.com/forum/#!topic/gobblin-users/kHrWh6lfGJM
>>>>
>>>> I have got the following questions
>>>>
>>>> 1) What is the use case for Gobblin As service, I don't see the
>>>> Orchestrator's rest endpoint port being configurable. If we have to add
>>>> FlowSpec using the different machine we need to know the Orchestrator's
>>>> host and port details, how do we do it?
>>>>
>>> We use d2 registry internally for it (if you dont already know about it
>> - search for RESTLI D2)
>>
>>
>>>
>>>> 2) Does FlowSpec creation creates a new Job deployment which can also
>>>> by copying the corresponding .pull or .job file in the gobblin distribution?
>>>>
>>> If you are saying that if you bundle a pull file in gobblin distribution
>> and create the same via FlowSpec would it mean the same thing, then yes.
>> Else I didnt understand the question.
>>
>>
>>>
>>>> 3) Since the master.out log gets created when starting a service, I
>>>> assume there could be a way to add more Orchestrators to the master that
is
>>>> started. However I am not sure how to do that, can this be clarified?
>>>>
>>> Only one node acts as orchestrator and scheduler. Rest of the nodes
>> receive requests and pass them to master for scheduling and orchestrating
>> via Helix messages.
>>
>>
>>>
>>>> Please note that I have been looking at the older code, the git log is
>>>> follow.
>>>> ************************************************************
>>>> ***********************************
>>>> commit 755da9160cd91ea5ebcc752603ce1bffb74a75a1 (HEAD -> master,
>>>> origin/master, origin/HEAD)
>>>> Author: Kuai Yu <yukuai518@gmail.com>
>>>> Date:   Tue Apr 11 19:10:53 2017 -0700
>>>> ************************************************************
>>>> ***********************************
>>>>
>>>>
>>>> Thanks,
>>>> Vicky
>>>>
>>>
>>>
>>
>

Mime
View raw message