airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Tyukin <bo...@boristyukin.com>
Subject parsing task instance log files
Date Thu, 09 Feb 2017 13:35:20 GMT
Hello,

I am using HiveCliHook called from PythonOperator to run a series of
queries and want to capture record counts for auditing and validation
purposes.

*I am thinking to use on_success_callback to invoke python function that
will read the log file, produced by airflow and then parse it out using
regex. *

*I am going to use this method from models to get to the file log:*

*def log_filepath(self): iso = self.execution_date.isoformat() log =
os.path.expanduser(configuration.get('core', 'BASE_LOG_FOLDER')) return (
"{log}/{self.dag_id}/{self.task_id}/{iso}.log".format(**locals()))*
Is this a good strategy or there is an easier way? I wondering if someone
did something similar.

Another challenge is that the same log file contains multiple attempts and
reruns of the same task so I guess I need to parse the file backwards.

thanks,
Boris

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message