Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 111E0200CBD for ; Thu, 6 Jul 2017 23:23:34 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 0FD06167779; Thu, 6 Jul 2017 21:23:34 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 2FABD167778 for ; Thu, 6 Jul 2017 23:23:33 +0200 (CEST) Received: (qmail 60449 invoked by uid 500); 6 Jul 2017 21:23:32 -0000 Mailing-List: contact dev-help@airflow.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.incubator.apache.org Delivered-To: mailing list dev@airflow.incubator.apache.org Received: (qmail 60437 invoked by uid 99); 6 Jul 2017 21:23:32 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Jul 2017 21:23:32 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 9B27A191507 for ; Thu, 6 Jul 2017 21:23:31 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.379 X-Spam-Level: *** X-Spam-Status: No, score=3.379 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, KAM_INFOUSMEBIZ=0.75, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id PE3IgIMHYzFq for ; Thu, 6 Jul 2017 21:23:29 +0000 (UTC) Received: from mail-lf0-f45.google.com (mail-lf0-f45.google.com [209.85.215.45]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id E4CC55F30C for ; Thu, 6 Jul 2017 21:23:28 +0000 (UTC) Received: by mail-lf0-f45.google.com with SMTP id b207so11765541lfg.2 for ; Thu, 06 Jul 2017 14:23:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Hnl4j6/GuT/fkBtHhg5BG/haQ8xKhnKVBYxuPM0j0U0=; b=FpJ2zVdkuU4MOiHQJIFUWigOmZWwbLGTk2yTpg4IzosVNjXEti3MqZFJ08aMT7wJQq Vx/DYUMVQu2HiZ5Oo0qL7qQsG0E2kL8lPRlvpluPPVf1ZF7DjGCsEl+utddvbv/LSuDQ IRg0R3NQfNtOxV+7C8BsE/DQ+W4KiPuAQIIgrfRxRupHMrzX2XSSRLlcY459s6BZ/ZbM hjZHtfpEokJMBd7ctS7YBZfKeXXuOg9GKTGYjheHWBiWAf7AcEJQ//5y2L30WMtzMjzX vN0zoybfPNHeUzCdQgk2hfiDR7brQAhbNlRONUTm/BmIC9Cd6Wr0M9cKxAPXpqMXegv8 mXyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Hnl4j6/GuT/fkBtHhg5BG/haQ8xKhnKVBYxuPM0j0U0=; b=HYS6qFL5oPZ5jHk1+7i03VYIO+9HA2g/ZNBxemvzWMnYI4FAZNvj7uRs5CRYYPnkvy nEXXy7uxROfIYKAOPDbMa61rNyBBp9Rya0MC4BAQ6bt/FCinEES2HKA3c5Hf+dNsglHp /sfMMnqYCc4Ljo/9tul1zpwYSnaQ/KA1q6Nf8J/fJNRaj8QZF98M94ZzWtUNXjIrUXsB PMf3dQkAomgoO9gba1jO6C7Wr+sSwLCQD13qTTm+bYtfyDp4l5OerECglvw46ALBodqS sL+1+8E3DliBQ4aOpehtm5OVcFTkMc5d+hbkOsY9QC7dDdJE8SOAAyyb280k+QLW7sNv k0qQ== X-Gm-Message-State: AKS2vOw22YXOmYUEJxdm+OxSvz013cS00wqk8nemn4C2GUE4vIr9oz/D RVtmWvoQqOsGl6nwDFcsv+dnSyQ1zQ== X-Received: by 10.46.22.72 with SMTP id 8mr16720705ljw.126.1499376207652; Thu, 06 Jul 2017 14:23:27 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Allison Wang Date: Thu, 06 Jul 2017 21:23:17 +0000 Message-ID: Subject: Re: Airflow Log Handler Abstractions To: Bolke de Bruin Cc: "dev@airflow.incubator.apache.org" Content-Type: multipart/alternative; boundary="f403045fbdd03dee950553acbaf7" archived-at: Thu, 06 Jul 2017 21:23:34 -0000 --f403045fbdd03dee950553acbaf7 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable cc dev@airflow.incubator.apache.org Hi Bolke, Thanks for the clarification. I was trying to separate the logging logic for task run out of cli.py and using custom configurations but I discovered it might need further abstractions and designs. We performed some sort of post task run logging operations to upload the local logs to S3/GCS. It's hard to abstract this logic now if we only return the logger. One way we can approach this problem is to add customized hooks abstractions for logging, or to trigger this upload to S3/GCS logic in other places. Dan and I discussed some of the possibilities today and we think this change is definitely desirable but requires much more time and efforts given the schedule of my internship. Another solution is to provide ability for customized implementation of task logging behaviors as in AIRFLOW-1385 (PR: https://github.com/apache/incubator-airflow/pull/2422). This is certainly not ideal but it won't make it harder for future logging refactoring work. What do you think about this approach? Thanks, Allison On Thu, Jul 6, 2017 at 1:39 PM Bolke de Bruin wrote: > Hi Allison, > > Nice to meet you and great that you put so much work into Airflow. The > logging is definitely in need of some love so I am very glad that you are > taking up this task. Thanks for reaching out. I=E2=80=99m not sure yet wh= ether we > are entirely on the same page yet, but let=E2=80=99s see if we can get th= ere. > > So the basic issue is, that we in certain circumstances would like to be > able to differentiate by originating source where logging should go. In > your case this is the =E2=80=9Ctry_number=E2=80=9D or =E2=80=9Cattempt=E2= =80=9D of a task instance. In > other words the logger should be context aware, i.e. it should look at th= e > =E2=80=9Ctry_number=E2=80=9D and some other data to determine what it nee= ds to do. > > From the perspective of the TaskInstance I would expect something like (a= s > explained in the big pr): > > from airflow import logging > > log =3D logging.getLogger(__name__) > log.info(=E2=80=9CMy Message=E2=80=9D, task_id=3DXX, dag_id=3DXX, executi= on_date=3DXXX) > > or > > Log =3D logging.getLogger(__name__, task_id=3DXX, dag_id=3DXX, > execution_date=3DXXX) > Log.info(=E2=80=9CMy Message=E2=80=9D) > > > Or you could do a LoggingMixin that picks up =E2=80=9Ctask_id, dag_id, > excution_date=E2=80=9D from the object. And just allows log.info(=E2=80= =9Cxxx=E2=80=9D) as well. > > > airflow.logging should then use handlers that you can make understand > these differences if required. So that could be a FileHandler with a > template as per your django link, or a DatabaseHandler or a SplunkHandler > etc. > > Does this make sense? > > Bolke > > N.B. It might be smart to cc the dev. Apache doesn=E2=80=99t really like = personal > conversations. > > > On 6 Jul 2017, at 09:58, Allison Wang wrote: > > Hi Bolke, > > My name is Allison and I am an intern on Airbnb's Data Platform team. > Sorry for not introducing myself earlier. As part of my internship projec= t, > I will improve Airflow logging both frontend (webserver) and backend. You= r > suggestion of abstracting the log handler for task instance is very > insightful and I am working on it now. Here is a general idea on how I am > going to implement it. Just want to make sure we are at the same page > before the actual coding part. > > The idea is very similar to logging.config, where you have a file > logging.yml defines basic configuration for setting up the log. Example > configuration file is like > > version: 1 > formatters: > log: > format: '[%%(asctime)s] {{%%(filename)s:%%(lineno)d}} %%(levelname)s = - > %%(message)s' > handlers: > file: > class: logging.FileHandler > formatter: log > level: INFO > filename: worker.log > console: > class : logging.StreamHandler > formatter: log > level: INFO > stream: ext://sys.stdout > loggers: > worker: > handlers: [file] > > However, the problem with using static logging.config is that we cannot > dynamically create filehandler with specific filename at runtime, or > anything requires runtime value when obtaining the logger. We want the > ability to instantiate log handlers with additional args like task_instan= ce > object. > > There are two solutions discusses here > http://codeinthehole.com/tips/a-deferred-logging-file-handler-for-django/= which > requires either loading config dynamically or changing settings > dynamically. I guess in this case the first solution is better and we can > call logging.config.dictConfig everytime we call run method in cli.py wit= h > format(ti). > > What do you think about this approach? Or is there any other way we can > customize those loggers? > > Thanks for taking the time doing code review and making Airflow awesome! > Feel free to comment and point out things I can do better. > > Thanks, > Allison > > > --f403045fbdd03dee950553acbaf7--