airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <>
Subject Re: Integrating SLURM/Torque/GridEngine/LSF/DIRAC/HTCondor batch systems
Date Mon, 06 Jun 2016 21:56:34 GMT
We don't do remote control, but we do have custom servers for Java apps. We
wrote our own web services to wrap these Java processes. As long as you use
standard cloud-style "the other end is flaky" design principles, this
should serve you well.

There is an option in Airflow for different flavors of "celery" executors.
Each executor talks directly to a central database. I would not try to run
it across remote sites.

On Mon, Jun 6, 2016 at 2:42 PM, Van Klaveren, Brian N. <> wrote:

> Hi,
> I'm interested in integrating some traditional batch systems with Airflow
> so I can run against any available batch resources. My use case is that I'd
> like to run a single airflow instance as a multi-tenant service which can
> dispatch to heterogeneous batch systems across the physical globe. A system
> I maintain does this, and I know HTCondor+DAGMan can do this by treating
> the batch systems as "grid resources". I'm trying to understand if this
> makes sense to even try with Airflow, so I have a few questions.
> 1. Has anyone looked into or tried this before? I've searched for several
> hours and was unable to find much on this
> 2. I have a rough idea how AirFlow works but I haven't dug deep into the
> code. If I was to implement something like this, should this be done as an
> operator (i.e. extend BashOperator?) or executor (Mesos Executor) or maybe
> both?
> 3. I've done this thing in the past, and typically you end up with a
> daemon/microservice running for each batch system. That microservice may be
> local to the batch system (works best in the case of LSF/torque/etc), or it
> may be local to the workflow engine but using some sort of exported remote
> API (e.g. grid-connected resources, often using globus APIs and x509
> certs), or there may be another layer of abstraction involved (in the case
> of DIRAC). Then you have a wrapper/pilot script which will trap a few
> signals and communicate back to the microservice or ot message queue
> (usually through HTTP or email because some batch systems are behind
> restrictive firewalls) when a job actually starts or finishes.
> Thanks,
> Brian

Lance Norskog
Redwood City, CA

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message