airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ash Berlin-Taylor <...@apache.org>
Subject Re: Need more pairs of eyes for the flaky LocalExecutorTest fix
Date Mon, 29 Apr 2019 21:52:57 GMT
We use multiprocessing queues in both the Celery and Kube executors too :D

-a

> On 29 Apr 2019, at 22:51, Driesprong, Fokko <fokko@driesprong.frl> wrote:
> 
> Thanks for the explanation. I'd go for the manager as well. In the end, if
> you really want to scale out, you should go for Celery/Kubernetes.
> 
> Cheers, Fokko
> 
> Op ma 29 apr. 2019 om 23:09 schreef Jarek Potiuk <Jarek.Potiuk@polidea.com>:
> 
>> Yep. Manager() creates a SyncManager instance and runs start().
>> This in turn creates a Manager process that all processes communicate with
>> via Proxies:
>> 
>> https://docs.python.org/3.4/library/multiprocessing.html?highlight=process#sharing-state-between-processes
>> 
>> It's reliable, but slower than Shared Memory used by direct Queue
>> instantiation.
>> 
>> J.
>> 
>> On Mon, Apr 29, 2019 at 10:26 PM Bas Harenslak <
>> basharenslak@godatadriven.com> wrote:
>> 
>>> @Fokko I believe that’s implicitly created by multiprocessing.Manager()<
>>> 
>> https://docs.python.org/3.4/library/multiprocessing.html?highlight=process#multiprocessing.sharedctypes.multiprocessing.Manager
>>>> :
>>> 
>>> ….. The returned manager object corresponds to a spawned child process
>> and
>>> ….
>>> 
>>> I’m in favour of the Manager route, since this doesn’t introduce
>>> additional complex multiprocessing code in Airflow. I’m reviewing right
>> now
>>> although it’s a bit out of my comfort zone...
>>> 
>>> Bas
>>> 
>>> On 29 Apr 2019, at 21:46, Driesprong, Fokko <fokko@driesprong.frl
>> <mailto:
>>> fokko@driesprong.frl>> wrote:
>>> 
>>> I'm missing the part of another process? This is within the Scheduler
>>> process if I understand correctly.
>>> 
>>> Cheers, Fokko
>>> 
>>> Op ma 29 apr. 2019 om 21:33 schreef Jarek Potiuk <
>> Jarek.Potiuk@polidea.com
>>> <mailto:Jarek.Potiuk@polidea.com>>:
>>> 
>>> I am also leaning towards the manager. I updated the
>>> https://github.com/apache/airflow/pull/5200 PR now after review and once
>>> it
>>> passes CI I think we can merge it.
>>> If anyone wants to have a look as well, happy to hear it.
>>> 
>>> J.
>>> 
>>> On Mon, Apr 29, 2019 at 2:14 PM Ash Berlin-Taylor <ash@apache.org
>> <mailto:
>>> ash@apache.org>> wrote:
>>> 
>>> I think I lean towards the built-in/manager approach as it is less
>>> concurrency code we have to manage/maintain in Airflow, though I'm not
>>> hugely happy about another process :(
>>> 
>>> -ash
>>> 
>>> On 29 Apr 2019, at 07:33, Jarek Potiuk <Jarek.Potiuk@polidea.com<mailto:
>>> Jarek.Potiuk@polidea.com>>
>>> wrote:
>>> 
>>> Hello Everyone,
>>> 
>>> I think we need some more pairs of eyes to take a look at potential
>>> fixes
>>> we have for the pesky LocalExecutorTest that we are all experiencing
>>> with
>>> our Travis builds. Once we solve it I think we should be much closer to
>>> have stable builds - including some other flaky test fixes merged
>>> recently.
>>> 
>>> It turned out that the problem relates to quite deep internals of how
>>> data
>>> is passed between processes using multiprocessing queues. It's really
>>> deep
>>> in the core processing of Airflow so I think it would be great if also
>>> other experienced Airflowers review and comment it and help to select
>>> the
>>> best solution as we could have missed something.
>>> 
>>> I was looking at it together with Ash and Bas and I (a bit too fast)
>>> merged
>>> a preliminary version of the fix last week. We reverted it later as it
>>> turned out to have some side effects, so we know we have to be careful
>>> with
>>> this one.
>>> 
>>> After more detailed analysis and discussions with Omar, we have now two
>>> potential candidates to fix it. Both are green and from local testing -
>>> both are solving the problem in a different way.
>>> 
>>> - https://github.com/apache/airflow/pull/5199
>>> - https://github.com/apache/airflow/pull/5200
>>> 
>>> I tried to describe the problem, solution candidates with Pros and Cons
>>> in
>>> the JIRA ticket :
>>> https://issues.apache.org/jira/browse/AIRFLOW-4401
>>> 
>>> I'd love if we can get reviews in the PRs and input to discussion on
>>> which
>>> solution to choose.
>>> 
>>> J.
>>> 
>>> 
>>> --
>>> 
>>> Jarek Potiuk
>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>> 
>>> M: +48 660 796 129 <+48660796129>
>>> E: jarek.potiuk@polidea.com<mailto:jarek.potiuk@polidea.com>
>>> 
>>> 
>>> 
>>> --
>>> 
>>> Jarek Potiuk
>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>> 
>>> M: +48 660 796 129 <+48660796129>
>>> E: jarek.potiuk@polidea.com<mailto:jarek.potiuk@polidea.com>
>>> 
>>> 
>>> 
>> 
>> --
>> 
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>> 
>> M: +48 660 796 129 <+48660796129>
>> E: jarek.potiuk@polidea.com
>> 


Mime
View raw message