airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bolke de Bruin <bdbr...@gmail.com>
Subject Re: Tasks not retrying on latest master commit
Date Sun, 18 Dec 2016 20:25:06 GMT
Hi Harvey,

I created https://github.com/apache/incubator-airflow/pull/1948 , this should remove the issue
for you. I’m not sure if this is the right approach but I tested it locally and it does
work. Please report back!

Bolke

> Op 18 dec. 2016, om 21:09 heeft Bolke de Bruin <bdbruin@gmail.com> het volgende
geschreven:
> 
> (also reported this on the Jira issue)
> 
> Ok I figured out the issue. In short: the scheduler checks the tasks instances without
taking into account if the executor already reported back. In this case the executor reports
back several iterations later, but the task is queued nevertheless. Due to the fact tasks
will not enter the queue when the task is considered running, the task state will be "queued”
indefinitely and in limbo between the scheduler and the executor.
> 
> The SequentialExecutor does not have this issue as it will wait for every task to finish
before returning. Celery I’m not quite sure yet.
> 
> Fixing this will take a bit more time as I’m unfamiliar with the code in this area
(the calling code that is). @max @dan @paul I really could use your help here.
> 
> - Bolke
> 
>> Op 15 dec. 2016, om 22:33 heeft Bolke de Bruin <bdbruin@gmail.com> het volgende
geschreven:
>> 
>> I’m having a look now but didn’t get to the cause yet. The line that reports
the issue is just a facade in the UI and it might not even report the real cause. Ie the task
is being send to the executor but seems already to be part of queued_tasks and then the executor
reports success, without actually running the task itself.
>> 
>> Paul and Dan were involved with this code and it was heavily changed so I have to
familiarize myself with it. 
>> 
>> - Bolke
>> 
>>> Op 13 dec. 2016, om 19:23 heeft Harvey Xia <harveyxia@spotify.com.INVALID>
het volgende geschreven:
>>> 
>>> Hi Bolke,
>>> 
>>> I have tried it on the latest release (1.7.1.3) and can confirm that
>>> retries *do *work. We are forced to use a later commit because we require a
>>> working GCP (Google Cloud Platform) hook, which did not seem to work on the
>>> latest release (upon glancing at the commit history, I think it's due to
>>> the fact taht the latest release does not use the latest version of a
>>> Google client). Another colleague of ours is using a version of Airflow
>>> that works with GCP and also does not suffer from this retry issue, so we
>>> could always use that one. But I wanted to raise this issue and try to
>>> understand why it's occurring. Let me know your thoughts, thanks!
>>> 
>>> 
>>> Harvey Xia | Software Engineer
>>> harveyxia@spotify.com
>>> +1 (339) 225 1875
>>> 
>>> On Tue, Dec 13, 2016 at 1:17 PM, Bolke de Bruin <bdbruin@gmail.com> wrote:
>>> 
>>>> Hey Harvey,
>>>> 
>>>> I don’t have the time to dive in right now, but is this bound to the
>>>> particular commit or did you just grab master at a specific point in time?
>>>> 
>>>> Did you try it on 1.7.1.3? Are you forced to use master?
>>>> 
>>>> - Bolke
>>>> 
>>>>> Op 13 dec. 2016, om 16:43 heeft Harvey Xia <harveyxia@spotify.com.INVALID>
>>>> het volgende geschreven:
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> I'm an engineer at Spotify, and our team has recently started using
>>>>> Airflow. I have posted the following issue, https://issues.apache.
>>>>> org/jira/browse/AIRFLOW-695, but was hoping to get in contact with
>>>> someone
>>>>> about this question. It is currently blocking us, so any response would
>>>> be
>>>>> greatly appreciated. Thanks so much!
>>>>> 
>>>>> Harvey Xia | Software Engineer
>>>>> harveyxia@spotify.com
>>>>> +1 (339) 225 1875
>>>> 
>>>> 
>> 
> 


Mime
View raw message