airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (Jira)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-6229) SparkSubmitOperator polls forever if status json can't find driverState tag
Date Fri, 27 Dec 2019 10:13:00 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17004065#comment-17004065
] 

ASF GitHub Bot commented on AIRFLOW-6229:
-----------------------------------------

potiuk commented on pull request #6918: [AIRFLOW-6229] SparkSubmitOperator polls forever if
status json can't…
URL: https://github.com/apache/airflow/pull/6918
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> SparkSubmitOperator polls forever if status json can't find driverState tag
> ---------------------------------------------------------------------------
>
>                 Key: AIRFLOW-6229
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6229
>             Project: Apache Airflow
>          Issue Type: New Feature
>          Components: scheduler
>    Affects Versions: 1.10.6
>            Reporter: t oo
>            Assignee: t oo
>            Priority: Major
>
> You click ‘release’ on a new spark cluster while the prior spark cluster is processing
some spark submits from airflow. Then airflow is never able to finish the sparksubmit task
as it polls from status on the new spark cluster build which it can’t find status for as
the submit happened on earlier spark cluster build….the status loop goes on forever
>  
> [https://github.com/apache/airflow/blob/1.10.6/airflow/contrib/hooks/spark_submit_hook.py#L446]
> [https://github.com/apache/airflow/blob/1.10.6/airflow/contrib/hooks/spark_submit_hook.py#L489]
> It loops forever if it can’t find driverState tag in the json response, since the new
build (pointed to by the released DNS name) doesn’t know about the driver submitted (in
previously released build) then the 2nd response below does not contain the driverState tag.
>   
> #response before clicking release on new build
> [ec2-user@reda ~]$
> curl +[http://dns:6066/v1/submissions/status/driver-20191202142207-0000]+
> {  "action" : "SubmissionStatusResponse",  "driverState" : "RUNNING",  "serverSparkVersion"
: "2.3.4",  "submissionId" : "driver-20191202142207-0000",  "success" : true,  "workerHostPort"
: "reda:31489",  "workerId" : "worker-20191202133526-reda-31489"}
>  
> #response after clicking release on new build
> [ec2-user@reda ~]$
> curl [http://dns:6066/v1/submissions/status/driver-20191202142207-0000]     
> {  "action" : "SubmissionStatusResponse",  "serverSparkVersion" : "2.3.4",  "submissionId"
: "driver-20191202142207-0000",  "success" : false               }
>                
>  
> Definitely a defect in current code. Can fix this by modifying _process_spark_status_log
function to set driver status to UNKNOWN if driverState is not in response after iterating
all lines.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message