From dev-return-11748-archive-asf-public=cust-asf.ponee.io@airflow.apache.org Sat Jun 20 07:26:35 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id D20B9180638 for ; Sat, 20 Jun 2020 09:26:34 +0200 (CEST) Received: (qmail 45216 invoked by uid 500); 20 Jun 2020 07:26:33 -0000 Mailing-List: contact dev-help@airflow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.apache.org Delivered-To: mailing list dev@airflow.apache.org Received: (qmail 45204 invoked by uid 99); 20 Jun 2020 07:26:33 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 20 Jun 2020 07:26:33 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 767981A33D0 for ; Sat, 20 Jun 2020 07:26:32 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.013 X-Spam-Level: X-Spam-Status: No, score=0.013 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_KAM_HTML_FONT_INVALID=0.01, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id hMMTXzC61j4d for ; Sat, 20 Jun 2020 07:26:30 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.218.41; helo=mail-ej1-f41.google.com; envelope-from=ybwang@gmail.com; receiver= Received: from mail-ej1-f41.google.com (mail-ej1-f41.google.com [209.85.218.41]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id ACD63BB802 for ; Sat, 20 Jun 2020 07:26:29 +0000 (UTC) Received: by mail-ej1-f41.google.com with SMTP id y13so12741209eju.2 for ; Sat, 20 Jun 2020 00:26:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=e8GjiOpZGvzQzB6cQ6s8BFDbwadVng5qwm6Vd4zxHv8=; b=OBy+2nPgVz79VGUT2iY24isL4/mtgrmu1UOFw9W8S/v2aJH37/djo9jCE/1EcYPMgs HUp8+ZH6AmTmdAd94cG+n8DBBUWMUyKTfg4vxk+sTuaG482/OmC3p/u6uS9VfHtdh6Cf Jct8AA+ioqweR3yWuyLE+gvyoIMWvQEywCSwRsqI8H8p7vv/Gnv/aP1lNXJiPiU7BGgA TROgtU4evnxe1K4g2fP0T1zD1Ko3/z90oG1DNvlH3K7dKBlvG/H9dB2vI1ibd/x9eNKk oQhb3vuxCGIXb1O8bwyLfKKT2L0kTcuFwEBgTg2z5gJaH5voDM5GJQTl6qgCJouXSADE noAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=e8GjiOpZGvzQzB6cQ6s8BFDbwadVng5qwm6Vd4zxHv8=; b=Z38aFrj2OC2wwssBbDQp+K0OXl+iH8J/2nH5N+taXUH0GeRnmpO76GpeIdadQFr7x2 XnQX+dFOJAGJbw6Vus5z70l2NQJYUiVZPz8SoBfh+n1N/EinRGbC1ZIJPUZiH+wj9VLV 4lz7qOVi9mcrMih0zebpmZtZrSwdThWWhAo5nm/FrE70q0wSonAnYyln60WwJ0TAg/9H bKeyx5JIBzTc2ihnTf013KkUjoTSj3cB1yEFFKiQ8nPTM/N1HDOmH92YsTMvx+73MhXp dll0sYl6kiSEgMzZ3/WZY54tzGWWkwPXo8EarSe8Wm9ANduP/7lDQr84eigHckL9pRGF LIiA== X-Gm-Message-State: AOAM531koqr/t9yZ6ca+qZUs/jtghgXFKbFPJbFJNI2b7FUr3iQbuxPw FTS0rFGlJyb62oCBoPbUjT+fLLL5OsPmaEl/X2HL75xjy3M= X-Google-Smtp-Source: ABdhPJwrvtRSzjWUc+PWh4AQLRVZIVnxgQSO5xAgQ1Uzsw5r7m8A/fTvW34pn55LgpCsl/PF75YNu6E967blIfwRJt8= X-Received: by 2002:a17:906:6b92:: with SMTP id l18mr7169684ejr.145.1592637987855; Sat, 20 Jun 2020 00:26:27 -0700 (PDT) MIME-Version: 1.0 References: <4d1666549e044d98963622c74e18a56f@credit-suisse.com> In-Reply-To: <4d1666549e044d98963622c74e18a56f@credit-suisse.com> From: Yingbo Wang Date: Sat, 20 Jun 2020 00:26:16 -0700 Message-ID: Subject: Re: [VOTE] AIP-17: Consolidate and de-duplicate sensor tasks in airflow Smart Sensor To: dev@airflow.apache.org Content-Type: multipart/alternative; boundary="00000000000085a24605a87eed34" --00000000000085a24605a87eed34 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks everyone for the feedback. I will also add the details mentioned in this thread into the AIP Q: From an implementation perspective, my one area of concern is the "sharding" concept and the configuration / management overhead involved. I may have missed it in the AIP, but would it be possible to add auto-scaling to minimize this configuration? The =E2=80=9Csharding=E2=80=9D configuration is an integer which implies th= e number of concurrently running smart sensor jobs for the whole airflow cluster. A proper sharding setting mainly depends on the following issues: 1. Cluster load -- how many sensor tasks need to be executed at the same time. 2. How often should each sensor be poked at least once. 3. The response time for a sensor task in the current system. As these answers may vary for different systems we leave =E2=80=9Csharding=E2=80=9D as a configurable field for use= rs to satisfy different use cases. Also, a couple of clarifying questions: 1. Do you know if this is more suitable to certain kinds of sensors vs. Others? Most sensors should be suitable for the smart sensor. Except if the argument needed to initialize a sensor object is unserializable, e.g. a function. Serialize more complex types other than builtin and datetime is not supported right now but we are planning to add them in the future. 2. What do you think about leveraging this to enable "async" operations using Airflow i.e. submit a task and then use a "smart sensor" to check for Completion? This is a very good point. We do notice the relationship between these two ideas. Technically this logic should also work. The =E2=80=9Ctask submissio= n=E2=80=9D map to the pre_execute() in a sensor task logic and =E2=80=9Ccheck for completi= on=E2=80=9D map to the sensor=E2=80=99s poke() function. The current implementation of SubDagOperator actually follows this pattern. If the operator requires no unserializable argument to be instantiated, we should already be able to leverage the async operation in SmartSensor for it. Q: How would a user enable their own smart sensors? I don=E2=80=99t see any= added documentation for this. It looks like they need to manually add the name of the class to the airflow configuration and do *something* to their sensor class, including override the "is_smart_sensor" method (why a method and not an attribute?) Having to enable it in multiple places seems a little cumbersome, why not have a "BaseSmartSensor" that the user inherits from like most of the rest of Airflow? Sensors inherited from BaseSmartSensor would be "Smart" when smart sensors are enabled in the configuration and not smart when smart sensors are not enabled. Enabling/Disabling the smart sensor is a system level config which is transparent to the individual users. An example of smart sensor enabled cluster config is as follows: [smart_sensor] use_smart_sensor =3D true shard_code_upper_limit =3D 10000 shards =3D 5 sensor_enabled =3D NamedHivePartitionSensor, MetastorePartitionSensor The "use_smart_sensor" config indicates if the smart sensor is enabled. The "shards" config indicates the number of concurrently running smart sensor jobs for the airflow cluster. The "sensor_enabled" config is a list of sensor class names that will use the smart sensor. The users use the same class names (e.g. HivePartitionSensor) in their DAGs and they don=E2=80=99t= have the control to use smart sensors or not, unless they exclude their tasks explicits. Existing DAGs don't need to be changed for enabling/disabling the smart sensor. =E2=80=9CIs_smart_sensor_compatible=E2=80=9D is a class level configuration= (instead of instance-level) so that the system knows if a particular sensor operator can use the smart sensor. Currently only NamedHivePartitionSensor and MetastorePartitionSensor are enabled to use the smart sensor in the PR. To include other sensor operators for smart sensors that are not included in this PR: 1. Add a class attribute "poke_context_fields" to the operator. "poke_context_fields" include all key names used for initializing a sens= or object. 2. In airflow.cfg, add the operator=E2=80=99s classname to the session of =E2=80=9C[smart_sensor]=E2=80=9D with the field =E2=80=9Csensors_enabled= =E2=80=9D as follows. Yingbo On Fri, Jun 19, 2020 at 7:27 AM Shaw, Damian P. < damian.shaw.2@credit-suisse.com> wrote: > Also +1 (non-binding) on the AIP but questions on the implementation. > > How would a user enable their own smart sensors? I don=E2=80=99t see any = added > documentation for this. It looks like they need to manually add the name = of > the class to the airflow configuration and do *something* to their sensor > class, including override the "is_smart_sensor" method (why a method and > not an attribute?) > > Having to enable it in multiple places seems a little cumbersome, why not > have a "BaseSmartSensor" that the user inherits from like most of the res= t > of Airflow? Sensors inherited from BaseSmartSensor would be "Smart" when > smart sensors are enabled in the configuration and not smart when smart > sensors are not enaled. > > Damian > > -----Original Message----- > From: Vikram Koka > Sent: Friday, June 19, 2020 00:57 > To: dev@airflow.apache.org > Subject: Re: [VOTE] AIP-17: Consolidate and de-duplicate sensor tasks in > airflow Smart Sensor > > +1 (non-binding) for this AIP. > > I really like the concept and the efficiency improvements. The general > SmartSensor concept and the ability to add additional sensor classes is > elegant. > > From an implementation perspective, my one area of concern is the > "sharding" concept and the configuration / management overhead involved. = I > may have missed it in the AIP, but would it be possible to add auto-scali= ng > to minimize this configuration? > > Also, a couple of clarifying questions: > 1. Do you know if this is more suitable to certain kinds of sensors vs. > others? > 2. What do you think about leveraging this to enable "async" operations > using Airflow i.e. submit a task and then use a "smart sensor" to check f= or > completion? > > Best regards, > > Vikram > > > > > On Thu, Jun 18, 2020 at 3:38 PM Yingbo Wang wrote: > > > Hello everyone! > > > > This email calls for a vote to add the airflow smart sensor at > > https://github.com/apache/airflow/pull/5499 > > > > AIP: > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-17%3A+Consolid > > ate+and+de-duplicate+sensor+tasks+in+airflow+Smart+Sensor > > > > Change summary: > > > > - Add a new mode called =E2=80=9Csmart sensor mode=E2=80=9D. In smar= t sensor mode, > > instead of holding a long running process for each sensor and poking > > periodically, a sensor will only store poke context at sensor_instan= ce > > table and then exits with a =E2=80=98sensing=E2=80=99 state. > > - When the smart sensor mode is enabled, a special set of builtin > smart > > sensor DAGs (named smart_sensor_group_shard_xxx) is created by the > > system; > > These DAGs contain SmartSensorOperator task and manage the smart > sensor > > jobs for the airflow cluster. The SmartSensorOperator task can fetch > > hundreds of =E2=80=98sensing=E2=80=99 instances from sensor_instance= table and poke on > > behalf of them in batches. Users don=E2=80=99t need to change their > > existing DAGs. > > - The smart sensor mode currently supports NamedHivePartitionSensor > and > > MetastorePartitionSensor however it can easily be extended to > > support more > > sensor classes. > > - Smart sensor mode on/off, the list of smart sensor enabled classes= , > > and the number of SmartSensorOperator tasks can be configured in > airflow > > config. > > - Sensor logs in smart sensors are populated to each task instance l= og > > UI. > > > > > > A PR https://github.com/apache/airflow/pull/5499 is ready for review > > from the committers and community. > > > > > > This email is formally calling for a vote to accept the AIP and PR. > > Please note that we will update the PR / feature to fix bugs if we find > any. > > > > > > Best > > > > Yingbo > > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D > > Please access the attached hyperlink for an important electronic > communications disclaimer: > http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D > > --00000000000085a24605a87eed34--