airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "chasuner (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AIRFLOW-1113) spark_submit_operator run again after an hour and log is not realtime
Date Sat, 15 Apr 2017 15:02:41 GMT
chasuner created AIRFLOW-1113:
---------------------------------

             Summary: spark_submit_operator run again after an hour and log is not realtime
                 Key: AIRFLOW-1113
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1113
             Project: Apache Airflow
          Issue Type: Bug
          Components: contrib
    Affects Versions: Airflow 1.8
            Reporter: chasuner


I have a dag that contains a spark_submit_operator task,  which is really wield and confused
me is that the spark_submit task run again after an hour the spark task run, but actually
the spark job has been successfully finished in about 47 min, but the subprocess of the spark
run task is Sl state in the worker machine; but another dag that contains another spark_submit
task that runs about 5 min works normally.

the log of the abnormal spark task:
[2017-04-15 07:01:06,048] {models.py:167} INFO - Filling up the DagBag from /data/airflow/dags/dag_etl.py
[2017-04-15 07:01:06,742] {base_task_runner.py:112} INFO - Running: ['bash', '-c', u'airflow
run dag_etl spark_etl 2017-04-14T07:00:00 --job_id 1239 --raw -sd DAGS_FOLDER/dag_etl.py']
[2017-04-15 07:01:07,844] {base_task_runner.py:95} INFO - Subtask: [2017-04-15 07:01:07,843]
{__init__.py:57} INFO - Using executor CeleryExecutor
[2017-04-15 07:01:08,504] {base_task_runner.py:95} INFO - Subtask: [2017-04-15 07:01:08,504]
{models.py:167} INFO - Filling up the DagBag from /data/airflow/dags/dag_etl.py
[2017-04-15 07:01:09,423] {base_task_runner.py:95} INFO - Subtask: [2017-04-15 07:01:09,423]
{models.py:1126} INFO - Dependencies all met for <TaskInstance: dag_etl.spark_etl 2017-04-14
07:00:00 [queued]>
[2017-04-15 07:01:09,440] {base_task_runner.py:95} INFO - Subtask: [2017-04-15 07:01:09,440]
{models.py:1126} INFO - Dependencies all met for <TaskInstance: dag_etl.spark_etl 2017-04-14
07:00:00 [queued]>
[2017-04-15 07:01:09,441] {base_task_runner.py:95} INFO - Subtask: [2017-04-15 07:01:09,440]
{models.py:1318} INFO - 
[2017-04-15 07:01:09,441] {base_task_runner.py:95} INFO - Subtask: --------------------------------------------------------------------------------
[2017-04-15 07:01:09,441] {base_task_runner.py:95} INFO - Subtask: Starting attempt 1 of 2
[2017-04-15 07:01:09,441] {base_task_runner.py:95} INFO - Subtask: --------------------------------------------------------------------------------
[2017-04-15 07:01:09,441] {base_task_runner.py:95} INFO - Subtask: 
[2017-04-15 07:01:09,453] {base_task_runner.py:95} INFO - Subtask: [2017-04-15 07:01:09,453]
{models.py:1342} INFO - Executing <Task(SparkSubmitOperator): spark_etl> on 2017-04-14
07:00:00
[2017-04-15 07:01:09,501] {base_task_runner.py:95} INFO - Subtask: [2017-04-15 07:01:09,500]
{base_hook.py:67} INFO - Using connection to: yarn
[2017-04-15 08:01:54,030] {models.py:167} INFO - Filling up the DagBag from /data/airflow/dags/dag_etl.py
[2017-04-15 08:01:54,713] {base_task_runner.py:112} INFO - Running: ['bash', '-c', u'airflow
run dag_etl spark_etl 2017-04-14T07:00:00 --job_id 1243 --raw -sd DAGS_FOLDER/dag_etl.py']
[2017-04-15 08:01:56,131] {base_task_runner.py:95} INFO - Subtask: [2017-04-15 08:01:56,130]
{__init__.py:57} INFO - Using executor CeleryExecutor
[2017-04-15 08:01:56,867] {base_task_runner.py:95} INFO - Subtask: [2017-04-15 08:01:56,867]
{models.py:167} INFO - Filling up the DagBag from /data/airflow/dags/dag_etl.py
[2017-04-15 08:01:57,503] {base_task_runner.py:95} INFO - Subtask: [2017-04-15 08:01:57,502]
{models.py:1120} INFO - Dependencies not met for <TaskInstance: dag_etl.spark_etl 2017-04-14
07:00:00 [running]>, dependency 'Task Instance Not Already Running' FAILED: Task is already
running, it started on 2017-04-15 07:01:09.
[2017-04-15 08:01:57,512] {base_task_runner.py:95} INFO - Subtask: [2017-04-15 08:01:57,512]
{models.py:1120} INFO - Dependencies not met for <TaskInstance: dag_etl.spark_etl 2017-04-14
07:00:00 [running]>, dependency 'Task Instance State' FAILED: Task is in the 'running'
state which is not a valid state for execution. The task must be cleared in order to be run.
[2017-04-15 08:01:59,104] {jobs.py:2148} WARNING - Recorded pid 26249 is not a descendant
of the current pid 9046




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message