airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shenoy, Gourav Ganesh" <goshe...@indiana.edu>
Subject Re: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement identification
Date Thu, 05 Oct 2017 15:20:05 GMT
Sorry, missed the attachment in my previous email.

PS: DC/OS is just a recommendation for performing containerized deployment and application
management for Airavata. I would be happy to consider alternative frameworks such as Kubernetes.

Thanks and Regards,
Gourav Shenoy

From: "Shenoy, Gourav Ganesh" <goshenoy@indiana.edu>
Reply-To: "dev@airavata.apache.org" <dev@airavata.apache.org>
Date: Thursday, October 5, 2017 at 11:16 AM
To: "dev@airavata.apache.org" <dev@airavata.apache.org>
Subject: Re: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement
identification

Hi Dimuthu,

Very good summary! I am not sure if you have, but DC/OS (DataCenter Operating System) is a
container orchestration platform based on Apache Mesos. The beauty of DC/OS is the ease and
simplicity of development/deployment; yet being extremely powerful in most of the parameters
– multi-datacenter, multi-cloud, scalability, high availability, fault tolerance, load balancing,
and more importantly the community support is fantastic.

DC/OS has an exhaustive service catalog, it’s more like a PAAS for containers (not just
restricted to containers though) – you can run services like Spark, Kafka, RabbitMQ, etc
out of the box with a single click install. And Apache Mesos as the underlying resource manager
makes it seamless to deploy applications across different datacenters. There is a concept
of SERVICE vs JOB – service is considered long running and DC/OS will make sure it keeps
it running (if a service fails, it spins up a new one), whereas jobs are one time executors.
This comes handy for using DC/OS as a target runtime for Airavata.

We used DC/OS for our class project to run the distributed task execution prototype we built
(which uses RabbitMQ messaging). Here’s a link to the blog I have explaining the process:
https://gouravshenoy.github.io/apache-airavata/spring17/2017/04/20/final-report.html . I have
also attached a PDF paper we wrote as part of the class explaining the task execution process
and one solution using rabbitmq messaging.

I had also started with the work of containerizing Airavata and a unified build + deployment
mechanism with CI CD on DC/OS. Unfortunately, I couldn’t complete it due to time constraints,
but I would be more than happy to work with you on this. Let me know and we can coordinate.

Thanks and Regards,
Gourav Shenoy

From: DImuthu Upeksha <dimuthu.upeksha2@gmail.com>
Reply-To: "dev@airavata.apache.org" <dev@airavata.apache.org>
Date: Thursday, October 5, 2017 at 9:52 AM
To: "dev@airavata.apache.org" <dev@airavata.apache.org>
Subject: Re: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement
identification

Hi Marlon,

Thanks for the input. I got your idea of availability mode and will keep in mind while designing
the PoC. CI/CD is the one I have missed and thanks for pointing it out.

Thanks
Dimuthu

On Thu, Oct 5, 2017 at 7:04 PM, Pierce, Marlon <marpierc@iu.edu<mailto:marpierc@iu.edu>>
wrote:
Thanks, Dimuthu, this is a good summary. Others may comment about Kafka, stateful versus stateless
parts of Airavata, etc.  You may also find some of this discussion on the mailing list archives.

Active-active vs. active-passive is a good question, and we have typically thought of this
in terms of individual Airavata components rather than the whole system.  Some components
can be active-active (like a stateless application manager), while others (like the orchestrator
example you give below) are stafefull and may be better as active-passive.

There is also the issue of system updates and continuous deployments, which could be added
to your list.

Marlon


From: "dimuthu.upeksha2@gmail.com<mailto:dimuthu.upeksha2@gmail.com>" <dimuthu.upeksha2@gmail.com<mailto:dimuthu.upeksha2@gmail.com>>
Reply-To: "dev@airavata.apache.org<mailto:dev@airavata.apache.org>" <dev@airavata.apache.org<mailto:dev@airavata.apache.org>>
Date: Thursday, October 5, 2017 at 2:40 AM
To: "dev@airavata.apache.org<mailto:dev@airavata.apache.org>" <dev@airavata.apache.org<mailto:dev@airavata.apache.org>>
Subject: Linked Container Services for Apache Airavata Components - Phase 1 - Requirement
identification

Hi All,

Within last few days, I have been going through the requirements and design of current setup
of Airavata and I identified following ares as the key focusing areas in the technology evaluation
phase

Micorservices deployment platform (container management system)

Possible candidates: Google Kubernetes, Apache Mesos, Apache Helix
As the most of the operational units of Airavata is supposed to be moving into microservices
based deployment pattern, having a unified deployment platform to manage those microservices
will make the DevOps operations easier and faster. From the other hand, although writing and
maintaining a single micro service is a somewhat straightforward way, making multiple microservies
running, monitoring and maintaining the lifecycles manually in a production environment is
an tiresome and complex operation to perform. Using such a deployment platform, we can easily
automate lots of pain points that I have mentioned earlier.

Scalability

We need a solution that can easily scalable depending on the load condition of several parts
of the system. For example, the workers in the post processing pipeline should be able scaled
up and down depending on the events come into the message queue.

Availability

We need to support solution to be deployed in multiple geographically distant data centers.
When evaluating container management systems, we should consider this is as a primary requirement.
However one thing that I am not sure is the availability mode that Airavata normally expect.
Is it a active-active mode or active-passive mode?

Service discovery

Once we move in to microservice based deployment pattern, there could be scenarios where we
want service discovery for several use cases. For example, if we are going to scale up API
Server to handle an increased load, we might have to put a load balancer in between the client
and API Server instances. In that case, service discovery is essential to instruct the load
balancer with healthy API Server endpoints which are currently running in the system.

Cluster coordination

Although micorservices are supposed to be stateless in most of the cases, we might have scenarios
to feed some state to particular micorservices. For example if we are going to implement a
microservice that perform Orchestrator's role, there could be issues if we keep multiple instances
of it in several data centers to increase the availability. According to my understanding,
there should be only one Orchestrator being running at a time as it is the one who takes decisions
of the job execution process. So, if we are going to keep multiple instances of it running
in the system, there should be an some sort of a leader election in between Orchestrator quorum.

Common messaging medium in between mocroservices

This might be out of the scope but I thought of sharing with the team to have an general idea.
Idea was raised at the hip chat discussion with Marlon and Gaourav. Using a common messaging
medium might enable microservices to communicate with in a decoupled manner which will increase
the scalability of the system. For example there is a reference architecture that we can utilize
with kafka based messaging medium [1], [2]. However I noticed in one paper that Kafka was
previously rejected as writing clients was onerous. Please share your views on this as I'm
not familiar with the existing fan out model based on AMQP and  pain points of it.

Those are the main areas that I have understood while going through Airavata current implementation
and requirements stated in some of the research papers. Please let me know whether my understanding
on above items are correct and suggestions are always welcome :)

[1] https://medium.com/@ulymarins/an-introduction-to-apache-kafka-and-microservices-communication-bf0a0966d63
[2] https://www.slideshare.net/ConfluentInc/microservices-in-the-apache-kafka-ecosystem

References

Marru, S., Gunathilake, L., Herath, C., Tangchaisin, P., Pierce, M., Mattmann, C., Singh,
R., Gunarathne, T., Chinthaka, E., Gardler, R. and Slominski, A., 2011, November. Apache airavata:
a framework for distributed applications and computational workflows. In Proceedings of the
2011 ACM workshop on Gateway computing environments (pp. 21-28). ACM.

Nakandala, S., Pamidighantam, S., Yodage, S., Doshi, N., Abeysinghe, E., Kankanamalage, C.P.,
Marru, S. and Pierce, M., 2016, July. Anatomy of the SEAGrid Science Gateway. In Proceedings
of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale (p. 40). ACM.

Pierce, Marlon E., Suresh Marru, Lahiru Gunathilake, Don Kushan Wijeratne, Raminder Singh,
Chathuri Wimalasena, Shameera Ratnayaka, and Sudhakar Pamidighantam. "Apache Airavata: design
and directions of a science gateway framework." Concurrency and Computation: Practice and
Experience 27, no. 16 (2015): 4282-4291.

Pierce, Marlon, Suresh Marru, Borries Demeler, Raminderjeet Singh, and Gary Gorbet. "The apache
airavata application programming interface: overview and evaluation with the UltraScan science
gateway." In Proceedings of the 9th Gateway Computing Environments Workshop, pp. 25-29. IEEE
Press, 2014.

Marru, Suresh, Marlon Pierce, Sudhakar Pamidighantam, and Chathuri Wimalasena. "Apache Airavata
as a laboratory: architecture and case study for component- based gateway middleware." In
Proceedings of the 1st Workshop on The Science of Cyberinfrastructure: Research, Experience,
Applications and Models, pp. 19-26. ACM, 2015.

Thanks
Dimuthu

Mime
View raw message