airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DImuthu Upeksha <dimuthu.upeks...@gmail.com>
Subject Re: Async Agents to handle long running jobs
Date Tue, 05 Dec 2017 16:01:03 GMT
Hi Suresh,

Got your point. I'm referring to Async Command Listener (as following
image) as the Event Listener. As you can see Agents are the ones who are
supposed to be inside the super computers as they are responsible for
executing jobs. If that's the case we have to port Agents into python.
Event listeners are just listening to the kafka event topic and processing
event messages. They can (should?) be placed outside the super computers.
I'm saying this assuming that we are deploying Airavata components (API
Server, Scheduler etc) outside the super computers. Please correct me if I
have not understood the current deployment correctly.


‚Äč

Thanks
Dimuthu

On Tue, Dec 5, 2017 at 9:07 PM, Suresh Marru <smarru@apache.org> wrote:

> Hi Dimuthu,
>
> I just have some high level observation so will top post.
>
> * I am also +0 on running high available services on kubernetes, no deep
> thoughts or for or against them. Just pondering at this point.
>
> * Regarding Event Listener on compute machines, these are typically
> supercomputers which do not allow any kernel modifications. Java is often
> foreign on these clusters because they are designed for high performance
> and typically only support low level languages like C, C++ and FORTRAN.
> Python is increasingly getting ubiquitous as well.
>
> Suresh
>
>
> On Dec 5, 2017, at 9:31 AM, DImuthu Upeksha <dimuthu.upeksha2@gmail.com>
> wrote:
>
> Hi Suresh,
>
> Thanks for the reply. Pease find my response inline for your questions
>
> On Tue, Dec 5, 2017 at 7:58 AM, Suresh Marru <smarru@apache.org> wrote:
>
>> Hi Dimuthu,
>>
>> This is neat design. Few questions to understand your implementation:
>>
>> * Since the Async command monitor needs to be persistent high available
>> service, is it advisable to run it as a Helix Participant or should we run
>> this outside of helix system like a API gateway?
>>
>
> This design does not assume Async Command Monitor as a persistent service.
> It reads the status of the Agent and directs the message flow in the
> correct path. In java world, it is like a switch case. However we need to
> make it highly available. By making it a Helix Participant and controlling
> the replication through Kubernetes, we can fulfill above requirement and
> keep it also as a generic component in the system.
>
>
>> * On a related note, any thoughts on running database also as part of the
>> kubernetes cluster? K8s has a MySQL example [1] but wondering on any other
>> pragmatic experiences.
>>
>
> Good suggestion. I also had that idea not only for MySQL, but for Kafka
> and Zookeeper. There are few challenges when we are trying to containerize
> those applications.
>
> 1. Applications like Zookeeper has a static unique name for each node in
> the Zookeeper quorum. And each node should be configured to know about
> other nodes before starting the node. For example each zoo.cfg file should
> contain entries like this before starting the cluster
>
> server.1=node1.thegeekstuff.com:2888:3888
> server.2=node2.thegeekstuff.com:2888:3888
> server.3=node3.thegeekstuff.com:2888:3888
>
> This is not container friendly. Containers are normally stateless. So it
> is challenging to spin up a failed container with the same identity (both
> form the same host name and static configuration). Kubernetes solves this
> by a concept called Stateful Sets where the newly spawned pod contains the
> same host name of the dead pod and same persistent volume.
>
> 2. Databases like MySQL should have a persistent data directory. So we
> should make sure that the newly spawned pods should be placed at the same
> node (physical machine) where old ones existed as data directories are not
> replicated among the nodes of the Kubernetes cluster. In this case also we
> should be able to use Stateful Sets to solve above issue. The link you
> shared also provides a good evidence for that
>
> 3. Above point (data directories) are valid for Kafka brokers. However
> most of the issues that we come across in containerizing Kafka brokers are
> also solved using Stateful Sets [1].
>
> So as a summary, we can almost deploy all 3 applications in Kubernetes in
> highly available manner with auto healing features. But we have to think
> about following facts aslo
>
> 1. These applications are not designed to run in containerized
> environments. I would say we are using some "hacks" to make it container
> friendly.
>
> 2. They are inherently highly available so why do we need to introduce
> another layer of high availability?
>
> 3. We can achieve auto healing in a Kubernetes cluster where a failed pod
> is automatically replaced by a new pod. But we can not let it to place in a
> different node (physical machine) because of the above constraints. So if a
> node failed, we can not use auto healing functionality of Kuberentes in
> this case.
>
> There are pros and cons when we are selecting an either approach. I think
> this should be open for discussion and get the viewpoints of others as
> well. Personally I'm +0 for Kubernetes approach :)
>
>
>> * We need to write the event listener preferably in Python since these
>> typically run on a compute cluster where java is not so well supported and
>> python is more ubiquitous.
>>
>
> That is possible. Event Listener interacts with Kafka and invokes API
> server. We can port them to python easily. However, as we are ultimately
> bundling them as Docker containers, language that we are using should not
> be an issue as the all the libraries that are required for each language
> are bundled in the same container image. We only need the Kernal of the
> host machine and docker installed on it with Kubernetes agents. I'm not
> sure that I have completely understood your claim about not supporting for
> java. Don't those compute machines support Java in Kernel level?
>
>
>> * What is your suggestion on the job description (the message payload in
>> your example) format? Can we send in a thrift binary through Kafka and have
>> the listener parse out the required information?
>>
>
> Should be possible and a good suggestion. We can write custom serializers
> and deserializers for Kafka message topics [2].
>
>
>> Suresh
>>
>> [1] - https://kubernetes.io/docs/tasks/run-application/run-repli
>> cated-stateful-application/
>>
>>
>> On Dec 4, 2017, at 1:30 PM, DImuthu Upeksha <dimuthu.upeksha2@gmail.com>
>> wrote:
>>
>> Hi folks,
>>
>> I have implemented the support to Async Job Submission with the callback
>> workflows on top of the proposed task execution framework. This supports to
>> both Async Job Submission in remote compute resources using Agents and
>> event driven job monitoring. Using this approach, I'm going to address
>> following issues that we are facing today
>>
>> 1. Resolve fault DoS attack detection in compute resources when doing
>> multiple ssh command executions in a short period of time.
>> 2. Optimize resource utilization and robustness of Airavata Task
>> Execution Framework when executing long running jobs
>>
>> Design and implementation details can be found from [1].
>> Sources for the main components can be found from [2], [3], [4]
>>
>> Please share your comments and suggestions
>>
>> [1] https://docs.google.com/document/d/1DIjrkjxZZWo9XiwkKWq9
>> WZiOX-uRD5WO-eB6TLxagAg/edit?usp=sharing
>> [2] https://github.com/DImuthuUpe/airavata-sandbox/tree/mast
>> er/airavata-kubernetes/modules/microservices/async-event-listener
>> [3] https://github.com/DImuthuUpe/airavata-sandbox/tree/mast
>> er/airavata-kubernetes/modules/microservices/tasks/async-command-monitor
>> [4] https://github.com/DImuthuUpe/airavata-sandbox/tree/mast
>> er/airavata-kubernetes/modules/microservices/tasks/async-command-task
>>
>> Thanks
>> Dimuthu
>>
>>
>> [1] https://github.com/kubernetes/contrib/tree/master/statefulsets/kafka
> [2] https://dzone.com/articles/kafka-sending-object-as-a-message
>
> Thanks
> Dimuthu
>
>
>

Mime
View raw message