Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 742F5200D47 for ; Sat, 25 Nov 2017 13:00:12 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 72909160C12; Sat, 25 Nov 2017 12:00:12 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 90F96160C00 for ; Sat, 25 Nov 2017 13:00:11 +0100 (CET) Received: (qmail 36331 invoked by uid 500); 25 Nov 2017 12:00:10 -0000 Mailing-List: contact commits-help@airflow.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.incubator.apache.org Delivered-To: mailing list commits@airflow.incubator.apache.org Received: (qmail 36322 invoked by uid 99); 25 Nov 2017 12:00:10 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 25 Nov 2017 12:00:10 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 90EAD1A101A for ; Sat, 25 Nov 2017 12:00:09 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id tI0WcYPWP9Fp for ; Sat, 25 Nov 2017 12:00:06 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id B01AF5FE0E for ; Sat, 25 Nov 2017 12:00:05 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id C0140E0732 for ; Sat, 25 Nov 2017 12:00:04 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 76E262419F for ; Sat, 25 Nov 2017 12:00:04 +0000 (UTC) Date: Sat, 25 Nov 2017 12:00:04 +0000 (UTC) From: "Semet (JIRA)" To: commits@airflow.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (AIRFLOW-1847) Webhook Sensor MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sat, 25 Nov 2017 12:00:12 -0000 [ https://issues.apache.org/jira/browse/AIRFLOW-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Semet updated AIRFLOW-1847: --------------------------- Description: h1. Webhook sensor May require a hook in the experimental API Register an api endpoint and wait for input on each. It is different than the {{dag_runs}} api in that the format is not airflow specific, it is just a callback web url called by an external system on some even with its application specific content. The content in really important and need to be sent to the dag (as XCom?) Use Case: - A Dag registers a WebHook sensor named {{}} - An custom endpoint is exposed at {{http://myairflow.server/api/experimental/webhook/}}. - I set this URL in the external system I wish to use the webhook from. Ex: github/gitlab project webhook - when the external application performs a request to this URL, this is automatically sent to the WebHook sensor. For simplicity, we can have a JsonWebHookSensor that would be able to carry any kind of json content. - sensor only job would be normally to trigger the exection of a DAG, providing it with the json content as xcom. If there are several requests at the same time, the system should be scalable enough to not die or not slow down the webui. It is also possible to instantiate an independant flask/gunicorn server to split the load. It would mean it runs on another port, but this could be just an option in the configuration file or even a complete independant application ({{airflow webhookserver}}). I saw recent changes integrated gunicorn in airflow core, guess it can help this use case. To support the charge, I think it is good that the part in the API just post the received request in an internal queue so the Sensor can handle them later without risk of missing one. Documentation would be updated to describe the classic scheme to implement this use case, which would look like: !airflow-webhook-proposal.png! I think it is good to split it into 2 DAGs, one for linear handling of the messages and triggering new DAG, and the processing DAG that might be executed in parallel. h2. Example usage in Sensor DAG: trigger a DAG on GitHub Push Event {code} sensor = JsonWebHookSensor( task_id='my_task_id', name="on_github_push" ) .. user is responsible to triggering the processing DAG himself. {code} In my github project, I register the following URL in webhook page: {code} http://airflow.myserver.com/api/experimental/webhook/on_github_push {code} From now on, on push, github will send a [json with this format|https://developer.github.com/v3/activity/events/types/#pushevent] to the previous URL. The {{JsonWebHookSensor}} receives the payload, and a new dag is triggered in this Sensing Dag. h2. Possible evolutions: - use an external queue (redis, amqp) to handle lot of events - allow batch processing (trigger processing DAG on n events or after a timeout, gathering n messages alltogether) - Security, authentication and other related subject might be adresses in another ticket. was: h1. Webhook sensor. May require a hook in the experimental API Register an api endpoint and wait for input on each. It is different than the {{dag_runs}} api in that the format is not airflow specific, it is just a callback web url called by an external system on some even with its application specific content. The content in really important and need to be sent to the dag (as XCom?) Use Case: - A Dag registers a WebHook sensor named {{}} - An custom endpoint is exposed at {{http://myairflow.server/api/experimental/webhook/}}. - I set this URL in the external system I wish to use the webhook from. Ex: github/gitlab project webhook - when the external application performs a request to this URL, this is automatically sent to the WebHook sensor. For simplicity, we can have a JsonWebHookSensor that would be able to carry any kind of json content. - sensor only job would be normally to trigger the exection of a DAG, providing it with the json content as xcom. If there are several requests at the same time, the system should be scalable enough to not die or not slow down the webui. It is also possible to instantiate an independant flask/gunicorn server to split the load. It would mean it runs on another port, but this could be just an option in the configuration file or even a complete independant application ({{airflow webhookserver}}). I saw recent changes integrated gunicorn in airflow core, guess it can help this use case. To support the charge, I think it is good that the part in the API just post the received request in an internal queue so the Sensor can handle them later without risk of missing one. Documentation would be updated to describe the classic scheme to implement this use case, which would look like: !airflow-webhook-proposal.png! I think it is good to split it into 2 DAGs, one for linear handling of the messages and triggering new DAG, and the processing DAG that might be executed in parallel. h2. Example usage in Sensor DAG: trigger a DAG on GitHub Push Event {code} sensor = JsonWebHookSensor( task_id='my_task_id', name="on_github_push" ) .. user is responsible to triggering the processing DAG himself. {code} In my github project, I register the following URL in webhook page: {code} http://airflow.myserver.com/api/experimental/webhook/on_github_push {code} From now on, on push, github will send a [json with this format|https://developer.github.com/v3/activity/events/types/#pushevent] to the previous URL. The {{JsonWebHookSensor}} receives the payload, and a new dag is triggered in this Sensing Dag. h2. Possible evolutions: - use an external queue (redis, amqp) to handle lot of events - allow batch processing (trigger processing DAG on n events or after a timeout, gathering n messages alltogether) - Security, authentication and other related subject might be adresses in another ticket. > Webhook Sensor > -------------- > > Key: AIRFLOW-1847 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1847 > Project: Apache Airflow > Issue Type: Improvement > Components: core, operators > Reporter: Semet > Assignee: Semet > Priority: Minor > Labels: api, sensors, webhook > Attachments: airflow-webhook-proposal.png > > > h1. Webhook sensor > May require a hook in the experimental API > Register an api endpoint and wait for input on each. > It is different than the {{dag_runs}} api in that the format is not airflow specific, it is just a callback web url called by an external system on some even with its application specific content. The content in really important and need to be sent to the dag (as XCom?) > Use Case: > - A Dag registers a WebHook sensor named {{}} > - An custom endpoint is exposed at {{http://myairflow.server/api/experimental/webhook/}}. > - I set this URL in the external system I wish to use the webhook from. Ex: github/gitlab project webhook > - when the external application performs a request to this URL, this is automatically sent to the WebHook sensor. For simplicity, we can have a JsonWebHookSensor that would be able to carry any kind of json content. > - sensor only job would be normally to trigger the exection of a DAG, providing it with the json content as xcom. > If there are several requests at the same time, the system should be scalable enough to not die or not slow down the webui. It is also possible to instantiate an independant flask/gunicorn server to split the load. It would mean it runs on another port, but this could be just an option in the configuration file or even a complete independant application ({{airflow webhookserver}}). I saw recent changes integrated gunicorn in airflow core, guess it can help this use case. > To support the charge, I think it is good that the part in the API just post the received request in an internal queue so the Sensor can handle them later without risk of missing one. > Documentation would be updated to describe the classic scheme to implement this use case, which would look like: > !airflow-webhook-proposal.png! > I think it is good to split it into 2 DAGs, one for linear handling of the messages and triggering new DAG, and the processing DAG that might be executed in parallel. > h2. Example usage in Sensor DAG: trigger a DAG on GitHub Push Event > {code} > sensor = JsonWebHookSensor( > task_id='my_task_id', > name="on_github_push" > ) > .. user is responsible to triggering the processing DAG himself. > {code} > In my github project, I register the following URL in webhook page: > {code} > http://airflow.myserver.com/api/experimental/webhook/on_github_push > {code} > From now on, on push, github will send a [json with this format|https://developer.github.com/v3/activity/events/types/#pushevent] to the previous URL. > The {{JsonWebHookSensor}} receives the payload, and a new dag is triggered in this Sensing Dag. > h2. Possible evolutions: > - use an external queue (redis, amqp) to handle lot of events > - allow batch processing (trigger processing DAG on n events or after a timeout, gathering n messages alltogether) > - Security, authentication and other related subject might be adresses in another ticket. -- This message was sent by Atlassian JIRA (v6.4.14#64029)