airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allison Wang <allisonwang...@gmail.com>
Subject Re: Airflow Logging Improvements
Date Thu, 22 Jun 2017 18:00:53 GMT
Hi Bolke,

I agree that we should make logging configurable lbut I wouldn't think
using handlers like python-elasticsearch-logger is a good idea over
flushing logs into files. Here are some reasons:

   1. Such handlers do not have the built-in backpressure-sensitive
   protocol that can prevent overwhelm ElasticSearch.
   2. Logs will be lost if ElasitcSearch cluster is down for reasons like
   upgrading.

In general, it's not a good practice for python logger talk directly to
ElasticSearch. Flushing logs into files give us more flexibilities to use
tools like Filebeat and Logstash to collect and index logs into
ElasticSearch.

Thanks,
Allison

On Thu, Jun 22, 2017 at 12:05 AM Bolke de Bruin <bdbruin@gmail.com> wrote:

> In the light of fixing logging, I would definitely appreciate written
> design. Especially, as there have been multiple attempts to fix some issues
> but these have been more like stop gap fixes.
>
> In my opinion Airflow should not stipulate in a hard coded fashion where
> and how logging takes place. It should behave more like ‘log4j’
> configurations. So it should not just use “dag_id + task+id +
> execution_date” and write this to an arbitrary location on the filesystem.
> I could imagine a settings file “logging.conf” that setups something like
> this:
>
> [logger_scheduler]
> level = INFO
> handler = stderr
> qualname = airflow.scheduler
> formatter=scheduler_formatter
>
> In airflow.cfg it should allow setting something like this:
>
> [scheduler]
> use_syslog = True
> syslog_log_facility = LOG_LOCAL0
>
> To allow logging to syslog so it can be moved to a centralised location if
> required (syslog being a special case afaik).
>
> Elasticsearch and any other backend can then just be a handler and we can
> remove the custom stuff that is proposed in PR
> https://github.com/apache/incubator-airflow/pull/2380 <
> https://github.com/apache/incubator-airflow/pull/2380> by
> https://github.com/cmanaha/python-elasticsearch-logger <
> https://github.com/cmanaha/python-elasticsearch-logger> for example.
>
> I then can be convinced to add something like “attempt”, but probably
> there are more friendly ways to solve it at that time. In addition
> ‘attempts' should then imho not be managed by the task or cli, but rather
> by the executor as that is the process which “attempts” a task.
>
> Bolke.
>
>
> > On 22 Jun 2017, at 01:21, Dan Davydov <dan.davydov@airbnb.com> wrote:
> >
> > Responding to some of Bolke's concerns in the github PR for this change:
> >
> > > Mmm still not convinced. Especially on elastic search it is just
> easier to use the start_date to shard on.
> > sharding on start_date isn't great because there is still some risk of
> collisions and it means that we are coupling the primary key with
> start_date unnecessarily (e.g. hypothetically you could allow two tasks to
> run at the same in Airflow and in this case start_date would no longer be a
> valid primary key), using monotonically increasing IDs for DB entries like
> this is pretty standard practice.
> >
> > > In addition I'm very against the managing of log files this way. Log
> files are already a mess and should be refactored to be consistent and to
> be managed from one place.
> > I agree about the logging mess, and there seem to have been efforts
> attempting to fix this but they have all been abandoned so we decided to
> move ahead with this change. I need to take a look at the PR first, but
> this change should actually make logging less messy, since it should add an
> abstraction for logging modules, and because you know exactly which try
> numbers (and how many) ran on which workers from the file path. The log
> folder structure already kind of mimicked the primary key of the
> task_instance table (dag_id + task_id + execution_date), but really
> try_number logically belongs in this key as well (at least for the key for
> log files).
> >
> > > The docker packagers can already not package airflow correctly without
> jumping through hoops. Arbitrarily naming it certainly does not help here.
> > If this is referring to the /<ATTEMPT #>/ in the path, I don't think
> this is arbitrarily naming it. A log "unit" really should be a single task
> run (not an arbitrary grouping of a variable number of multiple runs), and
> each unit should have a unique key or location. One of the reasons we are
> working on this effort is to actually make Airflow play nicer with
> Kubernetes/Docker (since airflow workers should ideally be ephemeral), and
> allowing a separate service to read and ship the logs is necessary in this
> case since the logs will be destroyed along with the worker instance. I
> think in the future we should also allow custom logging modules (e.g.
> directly writing logs to some service).
> >
> >
> > On Wed, Jun 21, 2017 at 3:11 PM, Allison Wang <allisonwang520@gmail.com
> <mailto:allisonwang520@gmail.com>> wrote:
> > Hi,
> >
> > I am in the process of making airflow logging backed by Elasticsearch
> (more detail please check AIRFLOW-1325 <
> https://issues.apache.org/jira/browse/AIRFLOW-1325>). Here are several
> more logging improvements we are considering:
> >
> > 1. Log streaming. Auto-refresh the logs if tasks are running.
> >
> > 2. Separate logs by attempts.
> >
> > Instead of logging everything into one file, logs can be separated by
> attempt number and displayed using tabs. Attempt number here is a
> monotonically increasing number that represents each task instance run
> (unlike try_number, clear task instance won't reset attempt number).
> > try_number: n^th retry by the task instance. try_number should not be
> greater than retries. Clear task will set try_number to 0.
> > attempt: number of times current task instance got executed.
> >
> > 3. Collapsable logs. Collapse logs that are mainly for debugging airflow
> internal and aren't really related to users' tasks (for example, logs
> showed before "starting attempt 1 of 1")
> >
> > All suggestions are welcome.
> >
> > Thanks,
> > Allison
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message