airflow-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Majure <jim.maj...@aurishealth.com>
Subject Volume mount issues when scaling on KubernetesExecutor
Date Wed, 22 Apr 2020 14:15:37 GMT
Hello,

I am attempting to execute a workload using the KubernetesExecutor in an AWS EKS cluster.
After a certain number of tasks start up, the pods start to take longer and longer to move
from a Pending phase to a Running phase. The issue appears to be related to mounting the volumes
that host the dags and logs folders. We start to see “FailedMount” events as the number
of tasks increase.

The dags and logs folders are being mounted using PersistentVolumes and PersistentVolumeClaims.
They are hosted on AWS EFS drives.

I have set up the PersistentVolumes in 2 ways, both with the same results.


  1.  Using the EFS ECI driver
  2.  Using a hostPath, with the drives mounted on the underlying EC2 instance.

As the workload begins to scale, the percentage of pods in Pending phase (vs Running phase)
continues to grow. Eventually, pods spawned by the KubernetesPodOperator start to fail because
the pod remains in the Pending phase for too long.

I’ve worked with AWS support and they don’t believe that the issue is related to the EFS
drives. From the evidence I can see, I tend to agree.

Has anyone seen anything similar to this? Has anybody been able to successfully scale up Airflow
on a K8S cluster?

Thanks,

Jim Majure| Principal Machine Learning Engineer
aurishealth.com<http://aurishealth.com/> | 150 Shoreline Dr. |  Redwood City, CA | 
94065<webextlink://150%20Shoreline%20Dr.%20|%20 Redwood%20City,%20CA%20|%20 94065>
(515) 829-0667

[signature_863625008]
Mime
View raw message