From dev-return-6624-archive-asf-public=cust-asf.ponee.io@airflow.incubator.apache.org Fri Sep 21 07:45:31 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 1E9A7180656 for ; Fri, 21 Sep 2018 07:45:30 +0200 (CEST) Received: (qmail 94364 invoked by uid 500); 21 Sep 2018 05:45:29 -0000 Mailing-List: contact dev-help@airflow.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.incubator.apache.org Delivered-To: mailing list dev@airflow.incubator.apache.org Received: (qmail 94351 invoked by uid 99); 21 Sep 2018 05:45:28 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Sep 2018 05:45:28 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 88E191865A0 for ; Fri, 21 Sep 2018 05:45:28 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.01 X-Spam-Level: X-Spam-Status: No, score=-0.01 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, MIME_QP_LONG_LINE=0.001, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=heisenbergwoodworking-com.20150623.gappssmtp.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id mRTqXHcLX71u for ; Fri, 21 Sep 2018 05:45:25 +0000 (UTC) Received: from mail-it1-f182.google.com (mail-it1-f182.google.com [209.85.166.182]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 074215F169 for ; Fri, 21 Sep 2018 05:45:25 +0000 (UTC) Received: by mail-it1-f182.google.com with SMTP id f14-v6so737467ita.4 for ; Thu, 20 Sep 2018 22:45:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=heisenbergwoodworking-com.20150623.gappssmtp.com; s=20150623; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=4OSn0VJYP7B5inrgy1301xhxzlUtL8vyI6y5B55LpaA=; b=POFdfXfF5DdfbsFIpO4R1WTUNDoFaDmlxcHBpFaIVyzbQwxm60HOh69c1jyVPHFEE6 sED5gnwQkA48v41SNpevAmpxPT5aHK7MpN+waPUmQ3GkOHaJLTAW6qG3vvbPMjaz1QQ8 A72WjxtntyHm6oTRpv/++bsIF6I8hMPPIYWaX2+ePFz5Ibb7vVBGW099Uc7vzjIxSTvY ERJ/8PLtuVOBm7d3TfL+nNDE0jH+0vlaq5y6fv9jlXcqG8SuJP6zViNQZfAk+J3YGdP0 G0rzCyazgMiX7tAWgPzMg5JzvqAG7naKDGP18R1sBs00zSELj+RGsnnb83k61Dv0M4/h CQOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=4OSn0VJYP7B5inrgy1301xhxzlUtL8vyI6y5B55LpaA=; b=NiEFq8NYK+GLWHdzxv8byaZstpHcJ9FTnBlVIG0amxlrdlZQa/2Pevktn18nSLuYDr UpUhjBd8xtfiSIQ7wXXdsvt07hlrYRehrv0c38MIrhxthZAnX6RDK8TK5MVN4FXmGQfK wtwraBYvfvAPRl4Zpyz5+zCedG/wnG74S17vNh7IcaktyZFoAJJ+x4tdRPH1ZFABY/uQ rI/8tKMVb7hNJ1HvEBBRq57GnweEfB9VSs25g6vgSuhXdCAMZhRa6OEtmkOsgu/bBUAz xPiyvUtvib0L4D/9sBf38CXhUeeb7t6bxYJyxAi4suXrAdiG824b3IzM9wi7r9SD1dfb 1ajQ== X-Gm-Message-State: APzg51C4oyWPm+tyzX+jQtSmtMLKOTCuFfH93Yyx9jf3XzUe05FMYZaF jyWZVChWj2K4cYnLyBZeNss+1FRwod8= X-Google-Smtp-Source: ANB0VdZL+Chj9EXp9h4bU09ZH99ovU8NUBeG33QaLK8iTVpDej1Lb4Fw49KbmlLDJIikRUZvoB6ShA== X-Received: by 2002:a24:6302:: with SMTP id j2-v6mr4825046itc.8.1537508717206; Thu, 20 Sep 2018 22:45:17 -0700 (PDT) Received: from [25.139.182.226] (ip-66-87-142-226.omahne.spcsdns.net. [66.87.142.226]) by smtp.gmail.com with ESMTPSA id l5-v6sm9302895ioq.2.2018.09.20.22.45.15 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 20 Sep 2018 22:45:15 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (1.0) Subject: Re: Fundamental change - Separate DAG name and id. From: Brian Greene X-Mailer: iPhone Mail (15G77) In-Reply-To: Date: Fri, 21 Sep 2018 00:45:14 -0500 Cc: airflowuser@protonmail.com Content-Transfer-Encoding: quoted-printable Message-Id: References: To: dev@airflow.incubator.apache.org Prior to using airflow for much, on first inspection, I think I may have agr= eed with you. After a bit of use I=E2=80=99d agree with Fokko and others - this isn=E2=80=99= t really a problem, and separating them seems to do more harm than good rela= ted to deployment. =20 I was gonna stop there, but why? You can add a task to a dag that=E2=80=99s deployed and has run and still vi= ew history. The =E2=80=9Cnew=E2=80=9D task shows up white Squares in the ol= d dags. nobody said you=E2=80=99re required to also rename the dag when you= do so this. If your process or desire or design determines you need to ren= ame it, well then by definition... isn=E2=80=99t it a new thing without a hi= story? Airflow is implementing exactly that. One could argue that renaming to reflect exact purpose is good practice. Ye= s, I=E2=80=99d agree, but again following that logic if it=E2=80=99s a small= enough change to =E2=80=9Cslip in=E2=80=9D then the name likely shouldn=E2=80= =99t change. If it=E2=80=99s big enough I want to change the name then it=E2= =80=99s a big enough change that I=E2=80=99m functionally running something =E2= =80=9Cnew=E2=80=9D, and I expect to need to account for that. Airflow is en= forcing that logic by coupling the name to the deployment of what you said w= as a new process. One might put forth that changing the name to be more descriptive In the ui m= akes it easier for support staff. I think perhaps if that=E2=80=99s your ch= allenge it=E2=80=99s not airflow that=E2=80=99s a problem. Dags are of cour= se documented elsewhere besides their name, right? Yeah it=E2=80=99s self d= ocumenting (and the graphs are cool), but I have to assume there=E2=80=99s s= omething besides the NAME to tell people what it does. Additionally, far mo= re than the name is required for even an operator or monitor watcher to take= action - you don=E2=80=99t expect them to know which tasks to rerun or how t= o troubleshoot failures just based on your =E2=80=9Cnow most descriptive nam= e in the UI=E2=80=9D do you? I spent time In an informatica shop where all the jobs were numbered. Numbe= red. Let=E2=80=99s be more exact... their NAMES were NUMBERS like 56709. Te= rrible, but 100% worked, because while a descriptive name would have been us= eful, the name is the thing that=E2=80=99s supposed to NOT CHANGE (see code o= f Abibarshim), and all the other information can attach to that in places wh= ere you write... other information. People would curse a number =E2=80=9CF=E2= =80=99ing 6291 failed again=E2=80=9D - everyone knew what they were talking a= bout.. I digress. You might decide to document =E2=80=9Cdag ID 12=E2=80=9D or just =E2=80=9C1= 2=E2=80=9D on your wiki - I=E2=80=99m going to document =E2=80=9Cdaily_sale= s_import=E2=80=9D. And when things start failing at 3am it=E2=80=99s not my= dag =E2=80=9C56=E2=80=9D that=E2=80=99s failing, it=E2=80=99s the sales_exp= ort dag. But if you document =E2=80=9C12=E2=80=9D, that=E2=80=99s still it=E2= =80=99s name, and it=E2=80=99d better be 12 in all your environments and doc= uments. This also means the actual db IDs from your proposal are almost cer= tainly NOT the same across your environments, making the 12 unchangeable nam= e! There are lots of languages (most of them) where the name of a thing is impo= rtant and hard to change. It=E2=80=99s not a bad thing, and I=E2=80=99d ass= ume that deploying a thing by name has some significance in many systems. G= o rename a class in... pick a language... tell me how that should be easier t= o do willy-nilly so it=E2=80=99s easier In the UI. =20 I suppose you could view it as a limitation, But i don=E2=80=99t think you=E2= =80=99ve illuminated a single use case where it=E2=80=99s an actual technica= l constraint or limitation. The BEST argument against the current implementation is db performance. It=E2= =80=99s a hogwash argument. Basic key indexes on low cardinality string col= umns are plenty fast for the airflow workload, and if your task load is so h= igh airflow can=E2=80=99t keep up or your seeing super-fast tasks and airflo= w db/tracking latency is too much... perhaps a messaging or queue processing= solution is better suited to those workloads. We see scheduler bottlenecks= long before the database for our =E2=80=9Cquick task=E2=80=9D scenarios. A= dditionally, reading through this list you=E2=80=99ll find people running ai= rflow at substantial scale - I=E2=80=99ve not seen anyone complaining of pro= duction performance issues based on this design decision. At first I hated= it. String keys are dirty, we=E2=80=99re all taught that as good little pr= ogrammers. Except when performance won=E2=80=99t be a huge consideration si= nce it=E2=80=99s not OLTP and easy of queryabilty is more important because i= t=E2=80=99s a growing system... good decision - whoever made it. How does filename matter? Frankly I wish the filename was REQUIRED to be th= e dag name so people would quit confusing themselves by mismatching them ! = We=E2=80=99ve renamed dag files with no issue as long as the content doesn=E2= =80=99t change, so again, not a real use case. And really - name your stuff= careful before you get to prod man. I gotta ask - airflowuser - are you gonna use airflow for anything, or just p= oke it with a stick from a distance and ask semi-inane questions of these fi= ne folks that wrote and spend time working on this cool piece of kit? B Sent from a device with less than stellar autocorrect > On Sep 20, 2018, at 3:12 PM, Driesprong, Fokko wrot= e: >=20 > I like the dag_id for both the name and as an unique identifier. If you > change the dag in such a way, that it deserves a new name, you probably > want to create a new dag anyway. If you want to give some additional > context, you can use the description field: > https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#= L3131-L3132 >=20 > The name of the file of dag does not have any influence. >=20 > My 2=C2=A2 >=20 > Cheers, Fokko >=20 > Op do 20 sep. 2018 om 19:40 schreef James Meickle > : >=20 >> I'm personally against having some kind of auto-increment numeric ID for >> DAGs. While this makes a lot of sense for systems where creation is a >> database activity (like a POST request), in Airflow, DAG creation is >> actually a code ship activity. There are all kinds of complex scenarios >> around that: >>=20 >> - I revert a commit and a DAG disappears or is renamed >> - I run the same file, twice, with multiple parameters to create two DAGs= >> - I create the DAG in both staging and prod, but they wind up with >> different IDs >>=20 >> It's just too hard to automatically track these scenarios. >>=20 >> If we really wanted to put something like this in place, it would first >> make more sense to decouple DAG creation from code shipping, and instead >> prefer creation of a DAG outside of code (but with a definition that >> references which git repo/committish/file/arguments/etc. to use). Then if= >> you do something like rename a file, the DAG breaks, but at least still >> exists in the db with that ID and history still makes sense once you upda= te >> the DAG definition with the new code location. >>=20 >> On Thu, Sep 20, 2018 at 4:52 AM airflowuser >> wrote: >>=20 >>> Hi, >>> though this could have been explained on Jira I think this should be >>> discussed first. >>>=20 >>> The problem: >>> Airflow mixes DAG name with id. It uses same filed for both purposes. >>>=20 >>> I assume that most of you use the dag_id to describe what the DAG >> actually >>> does. >>> For example: >>>=20 >>> dag =3D DAG( >>> dag_id=3D'cost_report_daily', >>> ... >>> ) >>>=20 >>> This dag_id is reflected to the dag id column in the UI. >>> Now, lets say that you want to add another task to this specific dag - >> You >>> are to be extremely careful when you change the dag_id to represent the >> new >>> functionality for example : dag_id=3D'cost_expenses_reports_daily' . Thi= s >>> will break the history of the DAG. >>>=20 >>> Or even with simpler use case.. the user just want to change the name he= >>> sees on the UI. >>>=20 >>> I suggest to have a discussion if the dag_id should be split into id (an= >>> actual id) and name to reflect what it does. When the "connection" is >> done >>> by id's - names can change as much as you want without breaking >> anything. >>> essentially it becomes a field uses for display purpose only. >>>=20 >>> * I didn't mention also the issue of DAG file name which can also cause >>> trouble if someone wants to change it. >>>=20 >>> Sent with [ProtonMail](https://protonmail.com) Secure Email. >>=20