airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Driesprong, Fokko" <fo...@driesprong.frl>
Subject Re: Need more pairs of eyes for the flaky LocalExecutorTest fix
Date Tue, 30 Apr 2019 07:57:22 GMT
Looking at the PR I only see them being used in the LocalExecutor and the
KubernetesExecutor, which makes sense. For CeleryExecutor we have Celery in
place which handles the callbacks for us, therefore we don't need to poll.

Apart from that, I think this is an improvement over the current situation.

Cheers, Fokko

Op ma 29 apr. 2019 om 23:53 schreef Ash Berlin-Taylor <ash@apache.org>:

> We use multiprocessing queues in both the Celery and Kube executors too :D
>
> -a
>
> > On 29 Apr 2019, at 22:51, Driesprong, Fokko <fokko@driesprong.frl>
> wrote:
> >
> > Thanks for the explanation. I'd go for the manager as well. In the end,
> if
> > you really want to scale out, you should go for Celery/Kubernetes.
> >
> > Cheers, Fokko
> >
> > Op ma 29 apr. 2019 om 23:09 schreef Jarek Potiuk <
> Jarek.Potiuk@polidea.com>:
> >
> >> Yep. Manager() creates a SyncManager instance and runs start().
> >> This in turn creates a Manager process that all processes communicate
> with
> >> via Proxies:
> >>
> >>
> https://docs.python.org/3.4/library/multiprocessing.html?highlight=process#sharing-state-between-processes
> >>
> >> It's reliable, but slower than Shared Memory used by direct Queue
> >> instantiation.
> >>
> >> J.
> >>
> >> On Mon, Apr 29, 2019 at 10:26 PM Bas Harenslak <
> >> basharenslak@godatadriven.com> wrote:
> >>
> >>> @Fokko I believe that’s implicitly created by
> multiprocessing.Manager()<
> >>>
> >>
> https://docs.python.org/3.4/library/multiprocessing.html?highlight=process#multiprocessing.sharedctypes.multiprocessing.Manager
> >>>> :
> >>>
> >>> ….. The returned manager object corresponds to a spawned child process
> >> and
> >>> ….
> >>>
> >>> I’m in favour of the Manager route, since this doesn’t introduce
> >>> additional complex multiprocessing code in Airflow. I’m reviewing right
> >> now
> >>> although it’s a bit out of my comfort zone...
> >>>
> >>> Bas
> >>>
> >>> On 29 Apr 2019, at 21:46, Driesprong, Fokko <fokko@driesprong.frl
> >> <mailto:
> >>> fokko@driesprong.frl>> wrote:
> >>>
> >>> I'm missing the part of another process? This is within the Scheduler
> >>> process if I understand correctly.
> >>>
> >>> Cheers, Fokko
> >>>
> >>> Op ma 29 apr. 2019 om 21:33 schreef Jarek Potiuk <
> >> Jarek.Potiuk@polidea.com
> >>> <mailto:Jarek.Potiuk@polidea.com>>:
> >>>
> >>> I am also leaning towards the manager. I updated the
> >>> https://github.com/apache/airflow/pull/5200 PR now after review and
> once
> >>> it
> >>> passes CI I think we can merge it.
> >>> If anyone wants to have a look as well, happy to hear it.
> >>>
> >>> J.
> >>>
> >>> On Mon, Apr 29, 2019 at 2:14 PM Ash Berlin-Taylor <ash@apache.org
> >> <mailto:
> >>> ash@apache.org>> wrote:
> >>>
> >>> I think I lean towards the built-in/manager approach as it is less
> >>> concurrency code we have to manage/maintain in Airflow, though I'm not
> >>> hugely happy about another process :(
> >>>
> >>> -ash
> >>>
> >>> On 29 Apr 2019, at 07:33, Jarek Potiuk <Jarek.Potiuk@polidea.com
> <mailto:
> >>> Jarek.Potiuk@polidea.com>>
> >>> wrote:
> >>>
> >>> Hello Everyone,
> >>>
> >>> I think we need some more pairs of eyes to take a look at potential
> >>> fixes
> >>> we have for the pesky LocalExecutorTest that we are all experiencing
> >>> with
> >>> our Travis builds. Once we solve it I think we should be much closer to
> >>> have stable builds - including some other flaky test fixes merged
> >>> recently.
> >>>
> >>> It turned out that the problem relates to quite deep internals of how
> >>> data
> >>> is passed between processes using multiprocessing queues. It's really
> >>> deep
> >>> in the core processing of Airflow so I think it would be great if also
> >>> other experienced Airflowers review and comment it and help to select
> >>> the
> >>> best solution as we could have missed something.
> >>>
> >>> I was looking at it together with Ash and Bas and I (a bit too fast)
> >>> merged
> >>> a preliminary version of the fix last week. We reverted it later as it
> >>> turned out to have some side effects, so we know we have to be careful
> >>> with
> >>> this one.
> >>>
> >>> After more detailed analysis and discussions with Omar, we have now two
> >>> potential candidates to fix it. Both are green and from local testing -
> >>> both are solving the problem in a different way.
> >>>
> >>> - https://github.com/apache/airflow/pull/5199
> >>> - https://github.com/apache/airflow/pull/5200
> >>>
> >>> I tried to describe the problem, solution candidates with Pros and Cons
> >>> in
> >>> the JIRA ticket :
> >>> https://issues.apache.org/jira/browse/AIRFLOW-4401
> >>>
> >>> I'd love if we can get reviews in the PRs and input to discussion on
> >>> which
> >>> solution to choose.
> >>>
> >>> J.
> >>>
> >>>
> >>> --
> >>>
> >>> Jarek Potiuk
> >>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>>
> >>> M: +48 660 796 129 <+48660796129>
> >>> E: jarek.potiuk@polidea.com<mailto:jarek.potiuk@polidea.com>
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> Jarek Potiuk
> >>> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>>
> >>> M: +48 660 796 129 <+48660796129>
> >>> E: jarek.potiuk@polidea.com<mailto:jarek.potiuk@polidea.com>
> >>>
> >>>
> >>>
> >>
> >> --
> >>
> >> Jarek Potiuk
> >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>
> >> M: +48 660 796 129 <+48660796129>
> >> E: jarek.potiuk@polidea.com
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message