airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ameya Advankar <aadva...@umail.iu.edu>
Subject Re: [#Spring17-Airavata-Courses] : Distributed Workload Management for Airavata
Date Thu, 09 Feb 2017 06:55:48 GMT
Hi,

The proposed design seems like a feasible solution for workload
distribution.
Some queries which I had are as follows -

*1.* The diagram depicts services would be deployed as independent jars
bundled in a war to a worker (based off "WAR" in the diagram). So I am
assuming in case we have 3 micro-services, there would be jar1, jar2, jar3
bundled inside war.

Now these services are independent and would be worked on separately with
probably separate releases.
But, having a single deploy-able war may lead to all services getting
re-deployed on a worker node for just a single service upgrade.
Ideally, an incremental build of Service 1 should only push Service 1 code
to the worker.

So probably a separate CI/CD for each component with its own deploy-able
jar instead of a single war would to be a better approach?


*2.* As per my understanding of the design so far, a Worker is a collection
of implementations i.e. A,B,C,D,etc and the Workers would be scaled
horizontally as needed.
What I would like to clarify is that whether 1 worker would necessarily
have just* 1 implementation of each service* or *could have
nx-implementations of  mx-services*.
A probable scaling issue I see with the former implementation, if it is
what is intended, is that in case only service x needs to be scaled up n
times, then it will have to be achieved by scaling the worker n times, but
it will lead to all the other services being scaled up too. I am not sure
how crucial resources/space are, but if they are, then this strategy might
not be optimal.
The latter implementation, which allows flexibility, would be favorable I
believe.


Thanks & Regards,
Ameya Advankar

On Wed, Feb 8, 2017 at 9:59 PM, Shenoy, Gourav Ganesh <goshenoy@indiana.edu>
wrote:

> Hi All,
>
>
>
> As I mentioned before, here is the design we have kind of reached a
> consensus on (please do provide comments/suggestions). This idea has been
> motivated from an understanding of the Aurora/Mesos architecture, and how
> they function.
>
>
>
>
>
> This design has the following benefits:
>
> -          Loosely coupled, independent micro-services.
>
> -          Inherently scalable in nature.
>
> -          Highly available, and consistent architecture.
>
> -          Supports incremental upgrade, without the risk of breaking any
> existing implementation while doing so.
>
> -          Ability to add/remove tasks in a DAG, and also add new task
> implementations (abstraction).
>
> -          Custom scheduler provides us greater flexibility (see below).
>
>
>
> We have the orchestrator (will eventually be HA using zookeeper), which
> will centrally maintain the state of an experiment – in short the status of
> the tasks it composes. Based on the type of job request, it will fetch the
> task execution DAG – this DAG will be made pre-available to the
> orchestrator via a graph database (debatable), and this DAG is nothing but
> a definition of sequence of tasks needed for that experiment (not the
> implementation of tasks).
>
>
>
> There is a scheduler which will receive a task execution request from the
> orchestrator, and *decide* which worker will be executing it. each worker
> here will be analogous to the current Airavata GFAC module which executes
> the task. We can think of the worker to be a collection of implementations
> of different tasks. Eg: W1, W2, W3 in figure above will have code to
> execute tasks A, B, C, D.
>
>
>
> There are 2 concerns which arise here:
>
> -          How does the scheduler know/decide which worker to pass on the
> task execution to?
>
> -          How do we upgrade a worker, say with a new task ‘E’
> implementation, in such a manner that if something goes wrong with code for
> ‘E’, the entire worker node should not fail? In short, avoid regression
> testing the entire worker module.
>
>
>
> To address the first problem, I suggest we use a paradigm similar to how
> Aurora agents (workers) report available capabilities to the Aurora master
> (scheduler). In Aurora, the slave nodes constantly report back to the
> master how much processing power they have; and accordingly, the master
> decides which slave to pass a new job request to. In our case, we can have
> the workers advertise to the scheduler which tasks they are capable of
> executing and the scheduler acts accordingly.
>
>
>
> To address the second concern, I suggest we have the task implementations
> bundled in separate JARs, so that if there is a problem with one task the
> others don’t get affected and can be “repaired” without impacting other
> existing tasks impls. There might be better ways to do this, but this is
> what I could think of right now.
>
>
>
> As mentioned before, adding a new task implementation – which will need
> upgrades to all workers will be easy and hassle-free as each worker will
> report back to the scheduler their capability to handle that new task, as
> and when upgrade finishes (incremental upgrade). Having a custom scheduler
> also provides us other benefits such as:
>
> -          Handling corner cases – eg: task execution on one worker fails
> (for some unforeseen reason), then the scheduler can retry it on a
> different worker.
>
> -          Prioritize experiments – scheduler higher priority experiments
> before normal priority ones (I just made this one up).
>
>
>
> We have decided to go ahead and start building a prototype of this design
> starting tomorrow, unless there are any concerns/issues. Please do let me
> know your views on this approach, as every concern helps us better our
> design.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
>
>
> *From: *"Shenoy, Gourav Ganesh" <goshenoy@indiana.edu>
> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
> *Date: *Wednesday, February 8, 2017 at 7:06 PM
>
> *To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
> Management for Airavata
>
>
>
> Hi Amruta,
>
>
>
> Thanks for providing your inputs, and yes in fact we had started out our
> design discussions with a decentralized framework in mind. But then we
> considered the problem of making each micro-service independent of each
> other and more importantly not making them aware of what the DAG is. For
> this reason, we decided to push and maintain the DAG at a centralized &
> highly available place (the orchestrator), giving us more control and
> flexibility in adding/removing tasks from the DAG. This also provides us
> with the ability to scale each service when needed and also perform
> incremental upgrades via devops.
>
>
>
> Do let me know if I make sense, or if there is something I am missing. I
> would also like to add that we have today nearly come to a consensus on a
> “fairly good” design – which I will be detailing in another email shortly.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *"Kamat, Amruta Ravalnath" <arkamat@indiana.edu>
> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
> *Date: *Wednesday, February 8, 2017 at 2:59 AM
> *To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
> Management for Airavata
>
>
>
> Hello Gourav,
>
>
>
> I agree with your solution, but I just came across a decentralized
> architecture which might serve our purpose and might provide a looser
> coupling.
>
>
>
> Having a common workflow would mean a centralized orchestrator i.e. a
> process which coordinates with multiple services to complete a larger
> workflow. The services have no knowledge of the workflow or their specific
> involvement in it. The orchestrator takes care of the complexities.
> However, The challenge with an orchestrator is that business logic will
> build up in a central place.
>
> If there is a central shared instance of the orchestrator for all
> requests, then the orchestrator is a single point of failure. If it goes
> down, all processing stops.
>
>
>
> With decentralized interactions, each service takes full responsibility
> for its role in the greater workflow. It will listen for events from other
> services, complete it's work as soon as possible, retry if a failure occurs
> and send out events upon completion. Here, communications tend to be
> asynchronous and business logic stays within the related services.
> Instead of having a central orchestrator that controls the logic of what
> steps happen when, that logic is built into each service ahead of time. The
> services know what to react to and how, ahead of time. Multiple services
> can consume the same events, do some processing, and then produce their own
> events back into the event stream, all at the same time. The event stream
> does not have any logic and is intended to be a dumb pipe.
>
>
>
> ​Decentralized interactions meet our requirements better: loose coupling,
> high cohesion and each service responsible for it's own bounded context.
>
>
>
> Thanks
>
> Amruta Kamat
> ------------------------------
>
> *From:* Shenoy, Gourav Ganesh <goshenoy@indiana.edu>
> *Sent:* Tuesday, February 7, 2017 11:49 PM
> *To:* dev@airavata.apache.org
> *Subject:* Re: [#Spring17-Airavata-Courses] : Distributed Workload
> Management for Airavata
>
>
>
> Supun,
>
>
>
> Thank you for this excellent explanation. I see that the architecture you
> mentioned covers most of the concerns we discussed in this thread and in
> class. I just had one clarifying question though – what does “worker”
> signify here? Is it a generic task execution framework which runs the DAG?
> Or is it a like a platform where the DAG runs (and how?).
>
>
>
> Apart from that, I am looking at Storm’s architecture to see if we can get
> some clues as they are tackling a similar problem. I shall update once I
> get some concrete answer.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *Supun Nakandala <supun.nakandala@gmail.com>
> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
> *Date: *Tuesday, February 7, 2017 at 5:47 PM
> *To: *dev <dev@airavata.apache.org>
> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
> Management for Airavata
>
>
>
> Hi Gourav,I agree with your idea of using one “workflow micro-service”
> which would basically be the mediator/orchestrator for deciding which
> micro-service should be executed next. But I think these components do not
> necessarily have to be micro-services but rather conforms to the
> master-worker paradigm in some sense. But the trick here is how can we
> implement a scalable, fault tolerant system to do distributed workload
> management and from CAP theorem what is the property that we are going to
> compromise.
>
>
>
> I think you are heading in the right direction. But I would like to add
> more details to your solution. Please note that I haven't evaluated these
> ideas 100%. Perhaps we can talk more about this in the next class.
>
>
>
> As you have done, I think we should centralize the state information into
> one component (orchestrator in our case). From my experience, it is very
> hard to achieve consistency in a distributed state setting in the events of
> failure.
>
>
>
> Second, to maintain generalizability in Airavata I think we should treat
> each application/use-cases as a DAG of execution. For example, HPC job and
> a cloud job will have two different DAGs which consists of tasks (data
> staging, job submission, out staging etc). These tasks should be short
> tasks and should roughly have the same execution time. And having
> idempotent tasks is preferable.
>
>
>
> Orchestrator is responsible for executing the DAG and assign tasks to the
> workers(how? will follow) based on the control dependencies in the DAG
> tasks. In addition to the dependencies generated from tasks I see, there
> can be other dependencies to things like monitoring and scheduling which
> the orchestrator has to make into account when executing the DAG.
>
>
>
> The next question is how we distribute jobs from Orchestrator to workers.
> I think here it is ok to compromise availability in favor of consistency. I
> suggest that we use the request/response messaging pattern which uses a
> persistent message broker (critical service). In this architecture, we can
> safely allow orchestrator or workers to fail without losing consistency
> (because of the persistent queue). But if the orchestrator fails then the
> availability will go down. One way to overcome this would be to come
> up with an orchestrator quorum.Attached figure summarizes my idea.
>
>
>
> I think we can also evaluate this solution with the concerns that Shameera
> pointed out such as can we enable cancel?. Once again it's just my idea and
> is open for argument and debate.
>
>
>
>
>
>
>
> [image: ine image 2]
>
>
>
> Thanks
>
> -Supun
>
>
>
>
>
>
>
> On Tue, Feb 7, 2017 at 10:54 AM, Shenoy, Gourav Ganesh <
> goshenoy@indiana.edu> wrote:
>
> Hi Supun,
>
>
>
> I agree, but may be for the example I mentioned, multiple micro-services
> might not sound necessary. I was trying to generalize towards a scenario
> where we have multiple independent micro-services (not necessarily for task
> execution). Again, I am not certain if this is the right architecture but
> yours (and other’s) inputs, will definitely help us narrow down on the
> different scenarios we need to exactly focus on. Do let me know if I make
> sense.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *Supun Nakandala <supun.nakandala@gmail.com>
> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
> *Date: *Monday, February 6, 2017 at 12:15 PM
> *To: *dev <dev@airavata.apache.org>
>
>
> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
> Management for Airavata
>
>
>
> Hi Gourav,
>
>
>
> It is my belief that we don't need a separate microservice to each task. I
> favor a single micro service which can execute all tasks (or in other words
> a generic task execution micro service). Of course, we can have many of
> them when we want to scale. WDYT?
>
>
>
> On Sun, Feb 5, 2017 at 3:07 PM, Shenoy, Gourav Ganesh <
> goshenoy@indiana.edu> wrote:
>
> Hi dev,
>
>
>
> We were brainstorming some potential designs that might help us with this
> problem. One possible option would be to have a “workflow micro-service”
> which would basically be the mediator/orchestrator for deciding which
> micro-service should be executed next – based on the type of the job. The
> motive is to make micro-services independent of the workflow; i.e. a
> micro-service implementation should be not be aware of which micro-service
> will be executed next and we should have a central control of deciding this
> pattern.
>
> Eg: For job type X, the pattern could be A -> B -> C -> D. Whereas for job
> type Y, the pattern could be A -> C -> D; and so on.
>
>
>
> An initial design with this idea looks like follows:
>
>
>
>
>
> We would have a common messaging framework (implementation has not been
> decided yet). The database associated with the workflow micro-service could
> be a graph database (maybe?) – again the implementation/technology has not
> been decided yet.
>
>
>
> This is just a proposed design, and I would love to hear your thoughts on
> this and any suggestions/comments if any. If there is anything that we are
> missing or should consider, please do let us know.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *"Christie, Marcus Aaron" <machrist@iu.edu>
> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
> *Date: *Friday, February 3, 2017 at 9:21 AM
>
>
> *To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
> Management for Airavata
>
>
>
> Vidya,
>
>
>
> I’m not sure how relevant it is, but it occurs to me that a microservice
> that executes jobs on a cloud requires very little in terms of resources to
> submit and monitor that job on the cloud. It doesn’t really matter if the
> job is a “big” or a “small” job.  So I’m not sure what heuristic makes
> sense regarding distributing work to these job execution microservices.
> Maybe a simple round robin approach would be sufficient.
>
>
>
> I think a job scheduling algorithm does make sense, however, for a higher
> level component, some sort of metascheduler that understands what resources
> are available on the cloud resources on which the jobs will be running.
> The metascheduler could create work for the job exection microservices to
> run on particular cloud resources in a way that optimizes for some metric
> (e.g., throughput).
>
>
>
> Thanks,
>
>
>
> Marcus
>
>
>
> On Feb 3, 2017, at 3:19 AM, Vidya Sagar Kalvakunta <vkalvaku@umail.iu.edu>
> wrote:
>
>
>
> Ajinkya,
>
>
>
> My scenario is for workload distribution among multiple instances of the
> same microservice.
>
>
>
> If a message broker needs to distribute the available jobs among multiple
> workers, the common approach would be to use round robin or a similar
> algorithm. This approach works best when all the workers are similar and
> the jobs are equal.
>
>
>
> So I think that a genetic or heuristic job scheduling algorithm, which is
> also aware of each of the worker's current state (CPU, RAM, No of Jobs
> processing) can more efficiently distribute the jobs. The workers can
> periodically ping the message broker with their current state info.
>
>
>
> The other advantage of using a customized algorithm is that it can
> be tweaked to use embedded routing, priority or other information in the
> job metadata to resolve all of the concerns raised by Amrutha viz message
> grouping, ordering, repeated messages, etc.
>
>
>
> We can even ensure data privacy, i.e if the workers are spread across
> multiple compute clusters say AWS and IU Big Red and we want to restrict
> certain sensitive jobs to be run only on Big Red.
>
>
>
> Some distributed job scheduling algorithms for cloud computing.
>
>    - http://www.ijimai.org/journal/sites/default/files/files/2013
>    /03/ijimai20132_18_pdf_62825.pdf
>    <http://www.ijimai.org/journal/sites/default/files/files/2013/03/ijimai20132_18_pdf_62825.pdf>
>    - https://arxiv.org/pdf/1404.5528.pdf
>
>
>
>
>
> Regards
>
> Vidya Sagar
>
>
>
> On Fri, Feb 3, 2017 at 1:38 AM, Kamat, Amruta Ravalnath <
> arkamat@indiana.edu> wrote:
>
> Hello all,
>
>
>
> Adding more information to the message based approach. Messaging is a key
> strategy employed in many distributed environments. Message queuing is
> ideally suited to performing asynchronous operations. A sender can post a
> message to a queue, but it does not have to wait while the message is
> retrieved and processed. A sender and receiver do not even have to be
> running concurrently.
>
>
>
> With message queuing there can be 2 possible scenarios:
>
>    1. ​Sending and receiving messages using a * single message queue.*
>    2. ​*Sharing a message queue* between many senders and receivers
>
> ​When a message is retrieved, it is removed from the queue. A message
> queue may also support message peeking. This mechanism can be useful if
> several receivers are retrieving messages from the same queue, but each
> receiver only wishes to handle specific messages. The receiver can examine
> the message it has peeked, and decide whether to retrieve the message
> (which removes it from the queue) or leave it on the queue for another
> receiver to handle.
>
>
>
> A few basic message queuing patterns are:
>
>    1. *One-way messaging*: The sender simply posts a message to the queue
>    in the expectation that a receiver will retrieve it and process it at some
>    point.
>    2. *Request/response messaging*: In this pattern a sender posts a
>    message to a queue and expects a response from the receiver. The sender can
>    resend if the message is not delivered. This pattern typically requires
>    some form of correlation to enable the sender to determine which response
>    message corresponds to which request sent to the receiver.
>    3. *Broadcast messaging*: In this pattern a sender posts a message to
>    a queue, and multiple receivers can read a copy of the message. This
>    pattern depends on the message queue being able to disseminate the same
>    message to multiple receivers. There is a queue to which the senders can
>    post messages that include metadata in the form of attributes. Each
>    receiver can create a subscription to the queue, specifying a filter that
>    examines the values of message attributes. Any messages posted to the
>    queue with attribute values that match the filter are automatically
>    forwarded to that subscription.
>
> A solution based on asynchronous messaging might need to address a number
> of concerns:
>
>
>
> *Message ordering, Message grouping: *Process messages either in the
> order they are posted or in a specific order based on priority. Also, there
> may be occasions when it is difficult to eliminate dependencies, and it may
> be necessary to group messages together so that they are all handled by the
> same receiver.
> *Idempotency: *Ideally the message processing logic in a receiver should
> be idempotent so that, if the work performed is repeated, this repetition
> does not change the state of the system.
> *Repeated messages: *Some message queuing systems implement duplicate
> message detection and removal based on message IDs
> *Poison messages: *A poison message is a message that cannot be handled,
> often because it is malformed or contains unexpected information.
> *Message expiration: *A message might have a limited lifetime, and if it
> is not processed within this period it might no longer be relevant and
> should be discarded.
> *Message scheduling: *A message might be temporarily embargoed and should
> not be processed until a specific date and time. The message should not be
> available to a receiver until this time.
>
>
> Thanks
>
> Amruta Kamat
> ------------------------------
>
> *From:* Shenoy, Gourav Ganesh <goshenoy@indiana.edu>
> *Sent:* Thursday, February 2, 2017 7:57 PM
> *To:* dev@airavata.apache.org
>
>
> *Subject:* Re: [#Spring17-Airavata-Courses] : Distributed Workload
> Management for Airavata
>
>
>
> Hello all,
>
>
>
> Amila, Sagar, thank you for the response and raising those concerns; and
> apologies because my email resonated the topic of workload management in
> terms of how micro-services communicate. As Ajinkya rightly mentioned,
> there exists some sort of correlation between micro-services communication
> and it’s impact on how that micro-service performs the work under those
> circumstances. The goal is to make sure we have maximum independence
> between micro-services, and investigate the workflow pattern in which these
> micro-services will operate such that we can find the right balance between
> availability & consistency. Again, from our preliminary analysis we can
> assert that these solutions may not be generic and the specific use-case
> will have a big decisive role.
>
>
>
> For starters, we are focusing on the following example – and I think this
> will clarify the doubts on what we are exactly trying to investigate about.
>
>
>
> *Our test example *
>
> Say we have the following 4 micro-services, which each perform a specific
> task as mentioned in the box.
>
>
>
> <image001.png>
>
>
>
>
>
> *A state-full pattern to distribute work*
>
> <image002.png>
>
>
>
> Here each communication between micro-services could be via RPC or
> Messaging (eg: RabbitMQ). Obvious disadvantage is that if any micro-service
> is down, then the system availability is at stake. In this test example, we
> can see that Microservice-A coordinates the work and maintains the state
> information.
>
>
>
> *A state-less pattern to distribute work*
>
>
>
> <image003.png>
>
>
>
> Another purely asynchronous approach would be to associate message-queues
> with each micro-service, where each micro-service performs it’s task,
> submits a request (message on bus) to the next micro-service, and continues
> to process more requests. This ensures more availability, and perhaps we
> might need to handle corner cases for failures such as message broker down,
> or message loss, etc.
>
>
>
> As mentioned, these are just a few proposals that we are planning to
> investigate via a prototype project. Inject corner cases/failures and try
> and find ways to handle these cases. I would love to hear more
> thoughts/questions/suggestions.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
> *From: *Ajinkya Dhamnaskar <adhamnas@umail.iu.edu>
> *Reply-To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
> *Date: *Thursday, February 2, 2017 at 2:22 AM
> *To: *"dev@airavata.apache.org" <dev@airavata.apache.org>
> *Subject: *Re: [#Spring17-Airavata-Courses] : Distributed Workload
> Management for Airavata
>
>
>
> Hello all,
>
>
>
> Just a heads up. Here the name Distributed workload management does not
> necessarily mean having different instances of a microservice and then
> distributing work among these instances.
>
>
>
> Apparently, the problem is how to make each microservice work
> independently with concrete distributed communication infrastructure. So,
> think of it as a workflow where each microservice does its part of work and
> communicates (how? yet to be decided) output. The next underlying
> microservice identifies and picks up that output and takes it further
> towards the final outcome, having said that, the crux here is, none of the
> miscoservices need to worry about other miscoservices in a pipeline.
>
>
>
> Vidya Sagar,
>
> I completely second your opinion of having stateless miscoservices, in
> fact that is the key. With stateless miscroservices it is difficult to
> guarantee consistency in a system but it solves the availability problem to
> some extent. I would be interested to understand what do you mean by "an
> intelligent job scheduling algorithm, which receives real-time updates from
> the microservices with their current state information".
>
>
>
> On Wed, Feb 1, 2017 at 11:48 PM, Vidya Sagar Kalvakunta <
> vkalvaku@umail.iu.edu> wrote:
>
>
>
> On Wed, Feb 1, 2017 at 2:37 PM, Amila Jayasekara <thejaka.amila@gmail.com>
> wrote:
>
> Hi Gourav,
>
>
>
> Sorry, I did not understand your question. Specifically I am having
> trouble relating "work load management" to options you suggest (RPC,
> message based etc.).
>
> So what exactly you mean by "workload management" ?
>
> What is work in this context ?
>
>
>
> Also, I did not understand what you meant by "the most efficient way".
> Efficient interms of what ? Are you looking at speed ?
>
>
>
> As per your suggestions, it seems you are trying to find a way to
> communicate between micro services. RPC might be troublesome if you need to
> communicate with processes separated from a firewall.
>
>
>
> Thanks
>
> -Thejaka
>
>
>
>
>
> On Wed, Feb 1, 2017 at 12:52 PM, Shenoy, Gourav Ganesh <
> goshenoy@indiana.edu> wrote:
>
> Hello dev, arch,
>
>
>
> As part of this Spring’17 Advanced Science Gateway Architecture course, we
> are working on trying to debate and find possible solutions to the issue of
> managing distributed workloads in Apache Airavata. This leads to the
> discussion of finding the most efficient way that different Airavata
> micro-services should communicate and distribute work, in such a way that:
>
> 1.       We maintain the ability to scale these micro-services whenever
> needed (autoscale perhaps?).
>
> 2.       Achieve fault tolerance.
>
> 3.       We can deploy these micro-services independently, or better in a
> containerized manner – keeping in mind the ability to use devops for
> deployment.
>
>
>
> As of now the options we are exploring are:
>
> 1.       RPC based communication
>
> 2.       Message based – either master-worker, or work-queue, etc
>
> 3.       A combination of both these approaches
>
>
>
> I am more inclined towards exploring the message based approach, but again
> there arises the possibility of handling limitations/corner cases of
> message broker such as downtimes (may be more). In my opinion, having
> asynchronous communication will help us achieve most of the above-mentioned
> points. Another debatable issue is making the micro-services implementation
> stateless, such that we do not have to pass the state information between
> micro-services.
>
>
>
> I would love to hear any thoughts/suggestions/comments on this topic and
> open up a discussion via this mail thread. If there is anything that I have
> missed which is relevant to this issue, please let me know.
>
>
>
> Thanks and Regards,
>
> Gourav Shenoy
>
>
>
>
>
> Hi Gourav,
>
>
>
> Correct me if I'm wrong, but I think this is a case of the job shop
> scheduling problem, as we may have 'n' jobs of varying processing times
> and memory requirements, and we have 'm' microservices with possibly
> different computing and memory capacities, and we are trying to minimize
> the makespan <https://en.wikipedia.org/wiki/Makespan>.
>
>
>
> For this use-case, I'm in favor a highly available and consistent message
> broker with an intelligent job scheduling algorithm, which receives
> real-time updates from the microservices with their current state
> information.
>
>
>
> As for the state vs stateless implementation, I think that question
> depends on the functionality of a particular microservice. In a broad
> sense, the stateless implementation should be preferred as it will scale
> better horizontally.
>
>
>
>
>
> Regards,
>
> Vidya Sagar
>
>
>
>
> --
>
> Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of Informatics
> and Computing | Indiana University Bloomington | (812) 691-5002
> <8126915002> | vkalvaku@iu.edu
>
>
>
>
>
> --
>
> Thanks and regards,
>
>
>
> Ajinkya Dhamnaskar
>
> Student ID : 0003469679
>
> Masters (CS)
>
> +1 (812) 369- 5416 <(812)%20369-5416>
>
>
>
>
>
> --
>
> Vidya Sagar Kalvakunta | Graduate MS CS Student | IU School of Informatics
> and Computing | Indiana University Bloomington | (812) 691-5002
> <8126915002> | vkalvaku@iu.edu
>
>
>
>
>
>
>
> --
>
> Thank you
> Supun Nakandala
> Dept. Computer Science and Engineering
> University of Moratuwa
>
>
>
>
>
> --
>
> Thank you
> Supun Nakandala
> Dept. Computer Science and Engineering
> University of Moratuwa
>

Mime
View raw message