airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bolke de Bruin <bdbr...@gmail.com>
Subject Re: Sensors
Date Mon, 23 Oct 2017 18:25:26 GMT
You could of course secure the endpoint with Nginx and use some form of basic auth or even
oauth. 

B.

Verstuurd vanaf mijn iPad

> Op 23 okt. 2017 om 19:20 heeft Niels Zeilemaker <niels@zeilemaker.nl> het volgende
geschreven:
> 
> Unfortunately, we are not using kerberos, hence we cannot use the rest
> api...
> 
> I'll have a look if I can implement http basic auth. That's probably the
> best option. Similar to Grant I'm not to happy with the very long running
> sensor job.
> 
> Niels
> 
> 
> Op 23 okt. 2017 7:10 p.m. schreef "Bolke de Bruin" <bdbruin@gmail.com>:
> 
> I think you can do something like Azure functions blob storage binding and
> let that kick off a dag by triggering it from the Rest API:
> 
> https://docs.microsoft.com/en-us/azure/azure-functions/
> functions-bindings-storage-blob <https://docs.microsoft.com/
> en-us/azure/azure-functions/functions-bindings-storage-blob>
> 
> I don’t use Azure so it might not fit your case.
> 
> Bolke
> 
>> On 23 Oct 2017, at 16:15, Grant Nicholas <grantnicholas2015@u.
> northwestern.edu> wrote:
>> 
>> It sounds like you want a background daemon that continuously monitors the
>> status of some external system and triggers things on a condition. This
>> does not sound like an ETL job, and thus airflow is not a great fit for
>> this type of problem. That said, there are workarounds like you mentioned.
>> One easy workaround if you can handle a delay between `condition happens
> ->
>> dag triggers` is setting your controller dag to have a recurring schedule
>> (ie: not None). Then when that controlling dag is triggered, you just
>> perform your sensor check once and then trigger/don't trigger another dag
>> depending on the condition. The thing I'd be worried about with your
>> `trigger dagrun` approach is if the trigger dagrun operator fails for any
>> reason you'll stop monitoring the external system, while with the
> scheduled
>> approach you don't have to worry about the failure modes of retrying
> failed
>> dags/etc.
>> 
>> On Mon, Oct 23, 2017 at 2:30 AM, Niels Zeilemaker <niels@zeilemaker.nl>
>> wrote:
>> 
>>> Hi Guys,
>>> 
>>> I've created a Sensor which is monitoring the number of files in an
>>> Azure Blobstore. If the number of files increases, then I would like
>>> to trigger another dag. This is more or less similar to the
>>> example_trigger_controller_dag.py and example_trigger_target_dag.py
>>> setup.
>>> 
>>> However, after triggering the target DAG I would want my controller
>>> DAG to start monitoring the Blobstore again. But since the schedule of
>>> the controller DAG is set to None, it doesn't continue monitoring. I
>>> "fixed" this by adding a TriggerDAG which schedules a new run of the
>>> Controller DAG. But this feels a bit like a hack.
>>> 
>>> Does someone have any experience which such a continuous monitoring
>>> sensor? Or know of a better way to achieve this?
>>> 
>>> Thanks,
>>> Niels
>>> 

Mime
View raw message