airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Supun Nakandala <supun.nakand...@gmail.com>
Subject Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata
Date Fri, 10 Feb 2017 19:00:27 GMT
Hi Amila,

By monitoring and scheduling dependencies I meant the following.

Monitoring dependencies: Eg. After a job is submitted to the remote host,
the DAG execution has to wait until the job completes before proceeding to
the next task. Currently, we handle this by monitoring emails. A separate
daemon is checking for emails. So (I think) we can consider waiting for
this email as having a monitoring dependency to the next task that has to
be executed.

Scheduling dependency: This is something that we currently don't have a use
case but which I think soon become as a requirement. For example, when
submitting jobs to Jetstream(which gives preference to interactive users)
we have to wait until the system becomes vacant. Thus even though a user
submits a job that job will have to wait until it is scheduled by an
external system/call. So my idea was to consider these things as external
scheduling dependencies. One might argue that scheduling sub-system also
has to be part of Airavata. But I think we can separate scheduling
sub-system and execution sub-system by having these scheduling dependencies.

Hope this clarifies your question.

On Fri, Feb 10, 2017 at 1:25 PM, Amila Jayasekara <thejaka.amila@gmail.com>
wrote:

> What are monitoring dependencies and scheduling dependencies in the
> execution DAG ?
>
> Thanks
> -Thejaka
>
> On Tue, Feb 7, 2017 at 5:47 PM, Supun Nakandala <supun.nakandala@gmail.com
> > wrote:
>
>> Hi Gourav,I agree with your idea of using one “workflow micro-service”
>> which would basically be the mediator/orchestrator for deciding which
>> micro-service should be executed next. But I think these components do not
>> necessarily have to be micro-services but rather conforms to the
>> master-worker paradigm in some sense. But the trick here is how can we
>> implement a scalable, fault tolerant system to do distributed workload
>> management and from CAP theorem what is the property that we are going to
>> compromise.
>>
>> I think you are heading in the right direction. But I would like to add
>> more details to your solution. Please note that I haven't evaluated these
>> ideas 100%. Perhaps we can talk more about this in the next class.
>>
>> As you have done, I think we should centralize the state information into
>> one component (orchestrator in our case). From my experience, it is very
>> hard to achieve consistency in a distributed state setting in the events of
>> failure.
>>
>> Second, to maintain generalizability in Airavata I think we should treat
>> each application/use-cases as a DAG of execution. For example, HPC job and
>> a cloud job will have two different DAGs which consists of tasks (data
>> staging, job submission, out staging etc). These tasks should be short
>> tasks and should roughly have the same execution time. And having
>> idempotent tasks is preferable.
>>
>> Orchestrator is responsible for executing the DAG and assign tasks to the
>> workers(how? will follow) based on the control dependencies in the DAG
>> tasks. In addition to the dependencies generated from tasks I see, there
>> can be other dependencies to things like monitoring and scheduling which
>> the orchestrator has to make into account when executing the DAG.
>>
>> The next question is how we distribute jobs from Orchestrator to workers.
>> I think here it is ok to compromise availability in favor of consistency. I
>> suggest that we use the request/response messaging pattern which uses a
>> persistent message broker (critical service). In this architecture, we can
>> safely allow orchestrator or workers to fail without losing consistency
>> (because of the persistent queue). But if the orchestrator fails then the
>> availability will go down. One way to overcome this would be to come
>> up with an orchestrator quorum.Attached figure summarizes my idea.
>>
>> I think we can also evaluate this solution with the concerns that
>> Shameera pointed out such as can we enable cancel?. Once again it's just my
>> idea and is open for argument and debate.
>>
>>
>>
>> [image: Inline image 2]
>>
>> Thanks
>> -Supun
>>
>>
>>
>> On Tue, Feb 7, 2017 at 10:54 AM, Shenoy, Gourav Ganesh <
>> goshenoy@indiana.edu> wrote:
>>
>>> Hi Supun,
>>>
>>>
>>>
>>> I agree, but may be for the example I mentioned, multiple micro-services
>>> might not sound necessary. I was trying to generalize towards a scenario
>>> where we have multiple independent micro-services (not necessarily for task
>>> execution). Again, I am not certain if this is the right architecture but
>>> yours (and other’s) inputs, will definitely help us narrow down on the
>>> different scenarios we need to exactly focus on. Do let me know if I make
>>> sense.
>>>
>>>
>>>
>>> Thanks and Regards,
>>>
>>> Gourav Shenoy
>>>
>>>
>>>
>>> *From: *Supun Nakandala <supun.nakandala@gmail.com>
>>> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
>>> *Date: *Monday, February 6, 2017 at 12:15 PM
>>> *To: *dev <dev@airavata.apache.org>
>>>
>>> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
>>> Management for Airavata
>>>
>>>
>>>
>>> Hi Gourav,
>>>
>>>
>>>
>>> It is my belief that we don't need a separate microservice to each task.
>>> I favor a single micro service which can execute all tasks (or in other
>>> words a generic task execution micro service). Of course, we can have many
>>> of them when we want to scale. WDYT?
>>>
>>>
>>>
>>> On Sun, Feb 5, 2017 at 3:07 PM, Shenoy, Gourav Ganesh <
>>> goshenoy@indiana.edu> wrote:
>>>
>>> Hi dev,
>>>
>>>
>>>
>>> We were brainstorming some potential designs that might help us with
>>> this problem. One possible option would be to have a “workflow
>>> micro-service” which would basically be the mediator/orchestrator for
>>> deciding which micro-service should be executed next – based on the type of
>>> the job. The motive is to make micro-services independent of the workflow;
>>> i.e. a micro-service implementation should be not be aware of which
>>> micro-service will be executed next and we should have a central control of
>>> deciding this pattern.
>>>
>>> Eg: For job type X, the pattern could be A -> B -> C -> D. Whereas for
>>> job type Y, the pattern could be A -> C -> D; and so on.
>>>
>>>
>>>
>>> An initial design with this idea looks like follows:
>>>
>>>
>>>
>>>
>>>
>>> We would have a common messaging framework (implementation has not been
>>> decided yet). The database associated with the workflow micro-service could
>>> be a graph database (maybe?) – again the implementation/technology has not
>>> been decided yet.
>>>
>>>
>>>
>>> This is just a proposed design, and I would love to hear your thoughts
>>> on this and any suggestions/comments if any. If there is anything that we
>>> are missing or should consider, please do let us know.
>>>
>>>
>>>
>>> Thanks and Regards,
>>>
>>> Gourav Shenoy
>>>
>>>
>>>
>>> *From: *"Christie, Marcus Aaron" <machrist@iu.edu>
>>> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
>>> *Date: *Friday, February 3, 2017 at 9:21 AM
>>>
>>>
>>> *To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
>>> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
>>> Management for Airavata
>>>
>>>
>>>
>>> Vidya,
>>>
>>>
>>>
>>> I’m not sure how relevant it is, but it occurs to me that a microservice
>>> that executes jobs on a cloud requires very little in terms of resources to
>>> submit and monitor that job on the cloud. It doesn’t really matter if the
>>> job is a “big” or a “small” job.  So I’m not sure what heuristic makes
>>> sense regarding distributing work to these job execution microservices.
>>> Maybe a simple round robin approach would be sufficient.
>>>
>>>
>>>
>>> I think a job scheduling algorithm does make sense, however, for a
>>> higher level component, some sort of metascheduler that understands what
>>> resources are available on the cloud resources on which the jobs will be
>>> running.  The metascheduler could create work for the job exection
>>> microservices to run on particular cloud resources in a way that optimizes
>>> for some metric (e.g., throughput).
>>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> Marcus
>>>
>>>
>>>
>>> On Feb 3, 2017, at 3:19 AM, Vidya Sagar Kalvakunta <
>>> vkalvaku@umail.iu.edu> wrote:
>>>
>>>
>>>
>>> Ajinkya,
>>>
>>>
>>>
>>> My scenario is for workload distribution among multiple instances of the
>>> same microservice.
>>>
>>>
>>>
>>> If a message broker needs to distribute the available jobs among
>>> multiple workers, the common approach would be to use round robin or a
>>> similar algorithm. This approach works best when all the workers are
>>> similar and the jobs are equal.
>>>
>>>
>>>
>>> So I think that a genetic or heuristic job scheduling algorithm, which
>>> is also aware of each of the worker's current state (CPU, RAM, No of Jobs
>>> processing) can more efficiently distribute the jobs. The workers can
>>> periodically ping the message broker with their current state info.
>>>
>>>
>>>
>>> The other advantage of using a customized algorithm is that it can
>>> be tweaked to use embedded routing, priority or other information in the
>>> job metadata to resolve all of the concerns raised by Amrutha viz message
>>> grouping, ordering, repeated messages, etc.
>>>
>>>
>>>
>>> We can even ensure data privacy, i.e if the workers are spread across
>>> multiple compute clusters say AWS and IU Big Red and we want to restrict
>>> certain sensitive jobs to be run only on Big Red.
>>>
>>>
>>>
>>> Some distributed job scheduling algorithms for cloud computing.
>>>
>>>    - http://www.ijimai.org/journal/sites/default/files/files/2013
>>>    /03/ijimai20132_18_pdf_62825.pdf
>>>    <http://www.ijimai.org/journal/sites/default/files/files/2013/03/ijimai20132_18_pdf_62825.pdf>
>>>    - https://arxiv.org/pdf/1404.5528.pdf
>>>
>>>
>>>
>>>
>>>
>>> Regards
>>>
>>> Vidya Sagar
>>>
>>>
>>>
>>> On Fri, Feb 3, 2017 at 1:38 AM, Kamat, Amruta Ravalnath <
>>> arkamat@indiana.edu> wrote:
>>>
>>> Hello all,
>>>
>>>
>>>
>>> Adding more information to the message based approach. Messaging is a
>>> key strategy employed in many distributed environments. Message queuing is
>>> ideally suited to performing asynchronous operations. A sender can post a
>>> message to a queue, but it does not have to wait while the message is
>>> retrieved and processed. A sender and receiver do not even have to be
>>> running concurrently.
>>>
>>>
>>>
>>> With message queuing there can be 2 possible scenarios:
>>>
>>>    1. ​Sending and receiving messages using a * single message queue.*
>>>    2. ​*Sharing a message queue* between many senders and receivers
>>>
>>> ​When a message is retrieved, it is removed from the queue. A message
>>> queue may also support message peeking. This mechanism can be useful if
>>> several receivers are retrieving messages from the same queue, but each
>>> receiver only wishes to handle specific messages. The receiver can examine
>>> the message it has peeked, and decide whether to retrieve the message
>>> (which removes it from the queue) or leave it on the queue for another
>>> receiver to handle.
>>>
>>>
>>>
>>> A few basic message queuing patterns are:
>>>
>>>    1. *One-way messaging*: The sender simply posts a message to the
>>>    queue in the expectation that a receiver will retrieve it and process it at
>>>    some point.
>>>    2. *Request/response messaging*: In this pattern a sender posts a
>>>    message to a queue and expects a response from the receiver. The sender can
>>>    resend if the message is not delivered. This pattern typically requires
>>>    some form of correlation to enable the sender to determine which response
>>>    message corresponds to which request sent to the receiver.
>>>    3. *Broadcast messaging*: In this pattern a sender posts a message
>>>    to a queue, and multiple receivers can read a copy of the message. This
>>>    pattern depends on the message queue being able to disseminate the same
>>>    message to multiple receivers. There is a queue to which the senders can
>>>    post messages that include metadata in the form of attributes. Each
>>>    receiver can create a subscription to the queue, specifying a filter that
>>>    examines the values of message attributes. Any messages posted to the
>>>    queue with attribute values that match the filter are automatically
>>>    forwarded to that subscription.
>>>
>>> A solution based on asynchronous messaging might need to address a
>>> number of concerns:
>>>
>>>
>>>
>>> *Message ordering, Message grouping: *Process messages either in the
>>> order they are posted or in a specific order based on priority. Also, there
>>> may be occasions when it is difficult to eliminate dependencies, and it may
>>> be necessary to group messages together so that they are all handled by the
>>> same receiver.
>>> *Idempotency: *Ideally the message processing logic in a receiver
>>> should be idempotent so that, if the work performed is repeated, this
>>> repetition does not change the state of the system.
>>> *Repeated messages: *Some message queuing systems implement duplicate
>>> message detection and removal based on message IDs
>>> *Poison messages: *A poison message is a message that cannot be
>>> handled, often because it is malformed or contains unexpected information.
>>> *Message expiration: *A message might have a limited lifetime, and if
>>> it is not processed within this period it might no longer be relevant and
>>> should be discarded.
>>> *Message scheduling: *A message might be temporarily embargoed and
>>> should not be processed until a specific date and time. The message should
>>> not be available to a receiver until this time.
>>>
>>>
>>> Thanks
>>>
>>> Amruta Kamat
>>>
>>> ------------------------------
>>>
>>> *From:* Shenoy, Gourav Ganesh <goshenoy@indiana.edu>
>>> *Sent:* Thursday, February 2, 2017 7:57 PM
>>> *To:* dev@airavata.apache.org
>>>
>>>
>>> *Subject:* Re: [#Spring17-Airavata-Courses] : Distributed Workload
>>> Management for Airavata
>>>
>>>
>>>
>>> Hello all,
>>>
>>>
>>>
>>> Amila, Sagar, thank you for the response and raising those concerns; and
>>> apologies because my email resonated the topic of workload management in
>>> terms of how micro-services communicate. As Ajinkya rightly mentioned,
>>> there exists some sort of correlation between micro-services communication
>>> and it’s impact on how that micro-service performs the work under those
>>> circumstances. The goal is to make sure we have maximum independence
>>> between micro-services, and investigate the workflow pattern in which these
>>> micro-services will operate such that we can find the right balance between
>>> availability & consistency. Again, from our preliminary analysis we can
>>> assert that these solutions may not be generic and the specific use-case
>>> will have a big decisive role.
>>>
>>>
>>>
>>> For starters, we are focusing on the following example – and I think
>>> this will clarify the doubts on what we are exactly trying to investigate
>>> about.
>>>
>>>
>>>
>>> *Our test example *
>>>
>>> Say we have the following 4 micro-services, which each perform a
>>> specific task as mentioned in the box.
>>>
>>>
>>>
>>> <image001.png>
>>>
>>>
>>>
>>>
>>>
>>> *A state-full pattern to distribute work*
>>>
>>> <image002.png>
>>>
>>>
>>>
>>> Here each communication between micro-services could be via RPC or
>>> Messaging (eg: RabbitMQ). Obvious disadvantage is that if any micro-service
>>> is down, then the system availability is at stake. In this test example, we
>>> can see that Microservice-A coordinates the work and maintains the state
>>> information.
>>>
>>>
>>>
>>> *A state-less pattern to distribute work*
>>>
>>>
>>>
>>> <image003.png>
>>>
>>>
>>>
>>> Another purely asynchronous approach would be to associate
>>> message-queues with each micro-service, where each micro-service performs
>>> it’s task, submits a request (message on bus) to the next micro-service,
>>> and continues to process more requests. This ensures more availability, and
>>> perhaps we might need to handle corner cases for failures such as message
>>> broker down, or message loss, etc.
>>>
>>>
>>>
>>> As mentioned, these are just a few proposals that we are planning to
>>> investigate via a prototype project. Inject corner cases/failures and try
>>> and find ways to handle these cases. I would love to hear more
>>> thoughts/questions/suggestions.
>>>
>>>
>>>
>>> Thanks and Regards,
>>>
>>> Gourav Shenoy
>>>
>>>
>>>
>>> *From: *Ajinkya Dhamnaskar <adhamnas@umail.iu.edu>
>>> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
>>> *Date: *Thursday, February 2, 2017 at 2:22 AM
>>> *To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
>>> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
>>> Management for Airavata
>>>
>>>
>>>
>>> Hello all,
>>>
>>>
>>>
>>> Just a heads up. Here the name Distributed workload management does not
>>> necessarily mean having different instances of a microservice and then
>>> distributing work among these instances.
>>>
>>>
>>>
>>> Apparently, the problem is how to make each microservice work
>>> independently with concrete distributed communication infrastructure. So,
>>> think of it as a workflow where each microservice does its part of work and
>>> communicates (how? yet to be decided) output. The next underlying
>>> microservice identifies and picks up that output and takes it further
>>> towards the final outcome, having said that, the crux here is, none of the
>>> miscoservices need to worry about other miscoservices in a pipeline.
>>>
>>>
>>>
>>> Vidya Sagar,
>>>
>>> I completely second your opinion of having stateless miscoservices, in
>>> fact that is the key. With stateless miscroservices it is difficult to
>>> guarantee consistency in a system but it solves the availability problem to
>>> some extent. I would be interested to understand what do you mean by "an
>>> intelligent job scheduling algorithm, which receives real-time updates from
>>> the microservices with their current state information".
>>>
>>>
>>>
>>> On Wed, Feb 1, 2017 at 11:48 PM, Vidya Sagar Kalvakunta <
>>> vkalvaku@umail.iu.edu> wrote:
>>>
>>>
>>>
>>> On Wed, Feb 1, 2017 at 2:37 PM, Amila Jayasekara <
>>> thejaka.amila@gmail.com> wrote:
>>>
>>> Hi Gourav,
>>>
>>>
>>>
>>> Sorry, I did not understand your question. Specifically I am having
>>> trouble relating "work load management" to options you suggest (RPC,
>>> message based etc.).
>>>
>>> So what exactly you mean by "workload management" ?
>>>
>>> What is work in this context ?
>>>
>>>
>>>
>>> Also, I did not understand what you meant by "the most efficient way".
>>> Efficient interms of what ? Are you looking at speed ?
>>>
>>>
>>>
>>> As per your suggestions, it seems you are trying to find a way to
>>> communicate between micro services. RPC might be troublesome if you need to
>>> communicate with processes separated from a firewall.
>>>
>>>
>>>
>>> Thanks
>>>
>>> -Thejaka
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Feb 1, 2017 at 12:52 PM, Shenoy, Gourav Ganesh <
>>> goshenoy@indiana.edu> wrote:
>>>
>>> Hello dev, arch,
>>>
>>>
>>>
>>> As part of this Spring’17 Advanced Science Gateway Architecture course,
>>> we are working on trying to debate and find possible solutions to the issue
>>> of managing distributed workloads in Apache Airavata. This leads to the
>>> discussion of finding the most efficient way that different Airavata
>>> micro-services should communicate and distribute work, in such a way that:
>>>
>>> 1.       We maintain the ability to scale these micro-services whenever
>>> needed (autoscale perhaps?).
>>>
>>> 2.       Achieve fault tolerance.
>>>
>>> 3.       We can deploy these micro-services independently, or better in
>>> a containerized manner – keeping in mind the ability to use devops for
>>> deployment.
>>>
>>>
>>>
>>> As of now the options we are exploring are:
>>>
>>> 1.       RPC based communication
>>>
>>> 2.       Message based – either master-worker, or work-queue, etc
>>>
>>> 3.       A combination of both these approaches
>>>
>>>
>>>
>>> I am more inclined towards exploring the message based approach, but
>>> again there arises the possibility of handling limitations/corner cases of
>>> message broker such as downtimes (may be more). In my opinion, having
>>> asynchronous communication will help us achieve most of the above-mentioned
>>> points. Another debatable issue is making the micro-services implementation
>>> stateless, such that we do not have to pass the state information between
>>> micro-services.
>>>
>>>
>>>
>>> I would love to hear any thoughts/suggestions/comments on this topic and
>>> open up a discussion via this mail thread. If there is anything that I have
>>> missed which is relevant to this issue, please let me know.
>>>
>>>
>>>
>>> Thanks and Regards,
>>>
>>> Gourav Shenoy
>>>
>>>
>>>
>>>
>>>
>>> Hi Gourav,
>>>
>>>
>>>
>>> Correct me if I'm wrong, but I think this is a case of the job shop
>>> scheduling problem, as we may have 'n' jobs of varying processing times
>>> and memory requirements, and we have 'm' microservices with possibly
>>> different computing and memory capacities, and we are trying to minimize
>>> the makespan <https://en.wikipedia.org/wiki/Makespan>.
>>>
>>>
>>>
>>> For this use-case, I'm in favor a highly available and consistent
>>> message broker with an intelligent job scheduling algorithm, which receives
>>> real-time updates from the microservices with their current state
>>> information.
>>>
>>>
>>>
>>> As for the state vs stateless implementation, I think that question
>>> depends on the functionality of a particular microservice. In a broad
>>> sense, the stateless implementation should be preferred as it will scale
>>> better horizontally.
>>>
>>>
>>>
>>>
>>>
>>> Regards,
>>>
>>> Vidya Sagar
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of
>>> Informatics and Computing | Indiana University Bloomington | (812)
>>> 691-5002 <8126915002> | vkalvaku@iu.edu
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Thanks and regards,
>>>
>>>
>>>
>>> Ajinkya Dhamnaskar
>>>
>>> Student ID : 0003469679
>>>
>>> Masters (CS)
>>>
>>> +1 (812) 369- 5416 <(812)%20369-5416>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of
>>> Informatics and Computing | Indiana University Bloomington | (812)
>>> 691-5002 <8126915002> | vkalvaku@iu.edu
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Thank you
>>> Supun Nakandala
>>> Dept. Computer Science and Engineering
>>> University of Moratuwa
>>>
>>
>>
>>
>> --
>> Thank you
>> Supun Nakandala
>> Dept. Computer Science and Engineering
>> University of Moratuwa
>>
>
>


-- 
Thank you
Supun Nakandala
Dept. Computer Science and Engineering
University of Moratuwa

Mime
View raw message