airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ash Berlin-Taylor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-3058) Airflow log & multi-threading
Date Wed, 05 Dec 2018 13:28:00 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710064#comment-16710064
] 

Ash Berlin-Taylor commented on AIRFLOW-3058:
--------------------------------------------

Was this fixed by disabling buffered IO from your python script? (for anyone else finding
this ticket in the future)

> Airflow log & multi-threading
> -----------------------------
>
>                 Key: AIRFLOW-3058
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3058
>             Project: Apache Airflow
>          Issue Type: Task
>            Reporter: jack
>            Priority: Major
>         Attachments: 456.PNG, Sni.PNG
>
>
> The airflow log does not show messages in real time when executing scripts with Multi-threading.
>  
> for example:
>  
> The left is the Airflow log time. the right is the actual time of the print in my code.
If I would execute the script without airflow the console will show the times on the right.
> !Sni.PNG!
> {code:java}
> 2018-09-13 14:19:17,325] {base_task_runner.py:98} INFO - Subtask: [2018-09-13 14:19:17,289]
{bash_operator.py:101} INFO - 2018-09-13 14:14:55.230044 Thread: Thread-1 Generate page: #0
run #0 with URL: http://...&cultureid=2&offset=0&limit=1000
> [2018-09-13 14:19:17,325] {base_task_runner.py:98} INFO - Subtask: [2018-09-13 14:19:17,289]
{bash_operator.py:101} INFO - 2018-09-13 14:14:55.231635 Thread: Thread-2 Generate page: #1
run #0 with URL: http://...&cultureid=2&offset=1000&limit=1000
> [2018-09-13 14:19:17,326] {base_task_runner.py:98} INFO - Subtask: [2018-09-13 14:19:17,289]
{bash_operator.py:101} INFO - 2018-09-13 14:14:55.233226 Thread: Thread-3 Generate page: #2
run #0 with URL: http://...&cultureid=2&offset=2000&limit=1000
> [2018-09-13 14:19:17,326] {base_task_runner.py:98} INFO - Subtask: [2018-09-13 14:19:17,289]
{bash_operator.py:101} INFO - 2018-09-13 14:14:55.234020 Thread: Thread-4 Generate page: #3
run #0 with URL: http://...&cultureid=2&offset=3000&limit=1000
> [2018-09-13 14:19:17,326] {base_task_runner.py:98} INFO - Subtask: [2018-09-13 14:19:17,290]
{bash_operator.py:101} INFO - 2018-09-13 14:15:43.100122 Thread: Thread-1 page 0 finished.
Length is 1000
> [2018-09-13 14:19:17,326] {base_task_runner.py:98} INFO - Subtask: [2018-09-13 14:19:17,290]
{bash_operator.py:101} INFO - 2018-09-13 14:15:43.100877 Thread: Thread-1 Generate page: #4
run #0 with URL: http://...&cultureid=2&offset=4000&limit=1000
> [2018-09-13 14:19:17,326] {base_task_runner.py:98} INFO - Subtask: [2018-09-13 14:19:17,290]
{bash_operator.py:101} INFO - 2018-09-13 14:15:46.254536 Thread: Thread-3 page 2 finished.
Length is 1000
> [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: [2018-09-13 14:19:17,290]
{bash_operator.py:101} INFO - 2018-09-13 14:15:46.255508 Thread: Thread-3 Generate page: #5
run #0 with URL: http://...&cultureid=2&offset=5000&limit=1000
> [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: [2018-09-13 14:19:17,290]
{bash_operator.py:101} INFO - 2018-09-13 14:15:51.096360 Thread: Thread-2 page 1 finished.
Length is 1000
> [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: [2018-09-13 14:19:17,290]
{bash_operator.py:101} INFO - 2018-09-13 14:15:51.097269 Thread: Thread-2 Generate page: #6
run #0 with URL: http://...&cultureid=2&offset=6000&limit=1000
> [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: [2018-09-13 14:19:17,290]
{bash_operator.py:101} INFO - 2018-09-13 14:15:53.112621 Thread: Thread-4 page 3 finished.
Length is 1000
> [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: [2018-09-13 14:19:17,290]
{bash_operator.py:101} INFO - 2018-09-13 14:15:53.113455 Thread: Thread-4 Generate page: #7
run #0 with URL: http://...&cultureid=2&offset=7000&limit=1000
> [2018-09-13 14:19:17,327] {base_task_runner.py:98} INFO - Subtask: [2018-09-13 14:19:17,290]
{bash_operator.py:101} INFO - 2018-09-13 14:16:37.345343 Thread: Thread-3 Generate page: #8
run #0 with URL: http://...&cultureid=2&offset=8000&limit=1000
> [2018-09-13 14:19:17,328] {base_task_runner.py:98} INFO - Subtask: [2018-09-13 14:19:17,290]
{bash_operator.py:101} INFO - 2018-09-13 14:16:37.701201 Thread: Thread-2 Generate page: #9
run #0 with URL: http://...&cultureid=2&offset=9000&limit=1000
> [2018-09-13 14:19:17,328] {base_task_runner.py:98} INFO - Subtask: [2018-09-13 14:19:17,291]
{bash_operator.py:101} INFO - 2018-09-13 14:16:47.283796 Thread: Thread-1 page 4 finished.
Length is 1000
> [2018-09-13 14:19:17,328] {base_task_runner.py:98} INFO - Subtask: [2018-09-13 14:19:17,291]
{bash_operator.py:101} INFO - 2018-09-13 14:17:27.169359 Thread: Thread-2 page 9 finished.
Length is 1000
>  
> {code}
> This never happens when executing regular code.. Happens only with multi-threading. I
have some other scripts that the airflow print appears after more than 30 minutes.
>  
>  Check this one:
> hours of delay and then printing everything together. These are not real time. the prints
in the log has no correlation to the actual time the command was executed.
>  
> !456.PNG!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message