airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ash Berlin-Taylor <ash_airflowl...@firemirror.com>
Subject Re: Operators that poll vs Sensors
Date Tue, 05 Sep 2017 15:35:10 GMT
The primary difference between those cases and the other Sensors is that the sensors I've seen
(EMR Job Flow, S3 Key) don't do anything _other_ than the sensing task, where as the tasks
you linked to also perform some other action; it's just that they wait until that operation
is complete before returning.

Additionally my understanding is that there Sensor's are just a API/python class-level convention
that don't make any difference to the scheduler, i.e. this is what the BaseSensor class does:


def execute(self, context):
  started_at = datetime.now()
  while not self.poke(context):
    if (datetime.now() - started_at).total_seconds() > self.timeout:
      if self.soft_fail:
        raise AirflowSkipException('Snap. Time is OUT.')
      else:
        raise AirflowSensorTimeout('Snap. Time is OUT.')
    sleep(self.poke_interval)
  logging.info("Success criteria met. Exiting.")

i.e. there's not much difference in effect from an operator that loops and sleeps itself to
one which is a Sensor.

-ash

> On 5 Sep 2017, at 16:14, Richard Baron Penman <richardbp@gmail.com> wrote:
> 
> Hello,
> 
> I noticed some operators in contrib (ECS, databricks, dataproc) submit
> their task and then poll until complete:
> https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/ecs_operator.py
> https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/databricks_operator.py
> https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/dataproc_operator.py
> 
> Would they be better designed as Sensors?
> 
> I ask because I wrote a Sensor for an API and wondering whether there was
> an advantage to the Operator polling approach.
> 
> Richard


Mime
View raw message