beam-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Weise <...@apache.org>
Subject Re: Possible Python SDK performance regression
Date Wed, 25 Sep 2019 13:57:56 GMT
After running through the entire bisect based on the 2.16 release branch I
found that the regression was caused by our own Cython setup. So green
light for the 2.16.0 release.

Thomas

On Tue, Sep 17, 2019 at 1:21 PM Thomas Weise <thw@apache.org> wrote:

> Hi Valentyn,
>
> Thanks for the reminder. The bisect is on my TODO list.
>
> Hopefully this week.
>
> I saw the discussion about declaring 2.16 LTS. We probably need to sort
> these performance concerns out prior to doing so.
>
> Thomas
>
>
> On Tue, Sep 17, 2019 at 12:02 PM Valentyn Tymofieiev <valentyn@google.com>
> wrote:
>
>> Hi Thomas,
>>
>> Just a reminder that 2.16.0 was cut and soon the voting may start, so to
>> avoid the regression that you reported blocking the vote, it would be great
>> to start investigate it if it is reproducible.
>>
>> Thanks,
>> Valentyn
>>
>> On Tue, Sep 10, 2019 at 1:53 PM Valentyn Tymofieiev <valentyn@google.com>
>> wrote:
>>
>>> Thomas, did you have a change to open a Jira for the streaming
>>> regression you observe? If not, could you please do so and cc +Ankur
>>> Goenka <goenka@google.com> ? I talked with Ankur offline and he is also
>>> interested in this regression.
>>>
>>> I opened:
>>> - https://issues.apache.org/jira/browse/BEAM-8198 for batch regression.
>>> - https://issues.apache.org/jira/browse/BEAM-8199 to improve tooling
>>> around performance monitoring.
>>> - https://issues.apache.org/jira/browse/BEAM-8200 to add benchmarks for
>>> streaming.
>>>
>>> I cc'ed some folks, however not everyone. Manisha, I could not find your
>>> username in Jira, feel free to cc or assign BEAM-8199
>>> <https://issues.apache.org/jira/browse/BEAM-8199>  to yourself if that
>>> is something you are actively working on.
>>>
>>> Thanks,
>>> Valentyn
>>>
>>> On Mon, Sep 9, 2019 at 9:59 AM Mark Liu <markliu@google.com> wrote:
>>>
>>>> +Alan Myrvold <amyrvold@google.com> +Yifan Zou <yifanzou@google.com>
It
>>>>> would be good to have alerts on benchmarks. Do we have such an ability
>>>>> today?
>>>>>
>>>>
>>>> As for regression detection, we have a Jenkins job
>>>> beam_PerformanceTests_Analysis
>>>> <https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PerformanceTests_Analysis/>
which
>>>> analyzes metrics on Bigquery and report a summary to job console output.
>>>> However, not all jobs are registered on this analyzer and currently no
>>>> further alerts integrated with it (e.g. email / slack).
>>>>
>>>> There are ongoing work to add alerting to benchmarks. Kasia and Kamil
>>>> are investigating on Prometheus + Grafana, and Manisha and me are looking
>>>> into mako.dev.
>>>>
>>>> Mark
>>>>
>>>> On Fri, Sep 6, 2019 at 7:21 PM Ahmet Altay <altay@google.com> wrote:
>>>>
>>>>> I agree, let's investigate. Thomas could you file JIRAs once you have
>>>>> additional information.
>>>>>
>>>>> Valentyn, I think the performance regression could be investigated
>>>>> now, by running whatever benchmarks that is available against 2.14, 2.15
>>>>> and head and see if the same regression could be reproduced.
>>>>>
>>>>> On Fri, Sep 6, 2019 at 7:11 PM Valentyn Tymofieiev <
>>>>> valentyn@google.com> wrote:
>>>>>
>>>>>> Sounds like these regressions need to be investigated ahead of 2.16.0
>>>>>> release.
>>>>>>
>>>>>> On Fri, Sep 6, 2019 at 6:44 PM Thomas Weise <thw@apache.org>
wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Sep 6, 2019 at 6:23 PM Ahmet Altay <altay@google.com>
wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Sep 6, 2019 at 6:17 PM Thomas Weise <thw@apache.org>
wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Sep 6, 2019 at 2:24 PM Valentyn Tymofieiev <valentyn@
>>>>>>>>> google.com> wrote:
>>>>>>>>>
>>>>>>>>>> +Mark Liu <markliu@google.com> has added some
benchmarks running
>>>>>>>>>> across multiple Python versions. Specifically we
run 1 GB wordcount job on
>>>>>>>>>> Dataflow runner on Python 2.7, 3.5-3.7. The benchmarks
do not have
>>>>>>>>>> configured alerting and to my knowledge are not actively
monitored yet.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Are there any benchmarks for streaming? Streaming and
batch are
>>>>>>>>> quite different runtime paths. And some of the issues
can only be
>>>>>>>>> identified with longer running processes through metrics.
It would be good
>>>>>>>>> to verify utilization of memory, cpu etc.
>>>>>>>>>
>>>>>>>>> I additionally discovered that our 2.16 upgrade exhibits
a memory
>>>>>>>>> leak in the Python worker (Py 2.7).
>>>>>>>>>
>>>>>>>>
>>>>>>>> Do you have more details on this one?
>>>>>>>>
>>>>>>>
>>>>>>> Unfortunately only that at the moment. The workers eat up all
memory
>>>>>>> and eventually crash. Reverted back to 2.14 / Py 3.6 and the
issue is gone.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thomas, is it possible for you to do the bisection
using SDK code
>>>>>>>>>> from master at various commits to narrow down the
regression on your end?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I don't know how soon I will get to it. It's of course
possible,
>>>>>>>>> but expensive due to having to rebase the fork, build
and deploy
>>>>>>>>> an entire stack of stuff for each iteration. The pipeline
itself is super
>>>>>>>>> simple. We need this testbed as part of Beam. It would
be nice to be able
>>>>>>>>> to pick an update and have more confidence that the baseline
has not
>>>>>>>>> slipped.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5691127080419328
>>>>>>>>>> [2]
>>>>>>>>>> https://drive.google.com/file/d/1ERlnN8bA2fKCUPBHTnid1l__81qpQe2W/view
>>>>>>>>>> [3]
>>>>>>>>>> https://github.com/apache/beam/commit/2d5e493abf39ee6fc89831bb0b7ec9fee592b9c5
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Sep 6, 2019 at 8:38 AM Ahmet Altay <altay@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> +Valentyn Tymofieiev <valentyn@google.com>
do we have
>>>>>>>>>>> benchmarks in different python versions? Was
there a recent change that is
>>>>>>>>>>> specific to python 3.x ?
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Sep 6, 2019 at 8:36 AM Thomas Weise <thw@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> The issue is only visible with Python 3.6,
not 2.7.
>>>>>>>>>>>>
>>>>>>>>>>>> If there is a framework in place to add a
streaming test, that
>>>>>>>>>>>> would be great. We would use what we have
internally as starting point.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Sep 5, 2019 at 5:00 PM Ahmet Altay
<altay@google.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 4:15 PM Thomas
Weise <thw@apache.org>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> The workload is quite different.
What I have is streaming
>>>>>>>>>>>>>> with state and timers.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:47 PM Pablo
Estrada <
>>>>>>>>>>>>>> pabloem@google.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We only recently started running
Chicago Taxi Example. +MichaƂ
>>>>>>>>>>>>>>> Walenia <michal.walenia@polidea.com>
I don't see it in the
>>>>>>>>>>>>>>> dashboards. Do you know if it's
possible to see any trends in the data?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We have a few tests running now:
>>>>>>>>>>>>>>> - Combine tests:
>>>>>>>>>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792&widget=201943890&container=1334074373
>>>>>>>>>>>>>>> - GBK tests:
>>>>>>>>>>>>>>> https://apache-beam-testing.appspot.com/explore?dashboard=5763764733345792&widget=201943890&container=1334074373
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> They don't seem to show a very
drastic jump either, but they
>>>>>>>>>>>>>>> aren't very old.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> There is also work ongoing to
add alerting for this sort of
>>>>>>>>>>>>>>> regressions by Kasia and Kamil
(added). The work is not there yet (it's in
>>>>>>>>>>>>>>> progress).
>>>>>>>>>>>>>>> Best
>>>>>>>>>>>>>>> -P.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:35 PM
Thomas Weise <thw@apache.org>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It probably won't be practical
to do a bisect due to the
>>>>>>>>>>>>>>>> high cost of each iteration
with our fork/deploy setup.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Perhaps it is time to setup
something with the synthetic
>>>>>>>>>>>>>>>> source that works just with
Beam as dependency.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>> I agree with this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Pablo, Kasia, Kamil, does the new benchmarks
give us a easy to
>>>>>>>>>>>>> use framework for using synthetic source
in benchmarks?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at 3:23
PM Ahmet Altay <
>>>>>>>>>>>>>>>> altay@google.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> There are a few in this
dashboard [1], but not very useful
>>>>>>>>>>>>>>>>> in this case because
they do not go back more than a month and not very
>>>>>>>>>>>>>>>>> comprehensive. I do not
see a jump there. Thomas, would it be possible to
>>>>>>>>>>>>>>>>> bisect to find what commit
caused the regression?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> +Pablo Estrada <pabloem@google.com>
do we have any python
>>>>>>>>>>>>>>>>> on flink benchmarks for
chicago example?
>>>>>>>>>>>>>>>>> +Alan Myrvold <amyrvold@google.com>
+Yifan Zou
>>>>>>>>>>>>>>>>> <yifanzou@google.com>
It would be good to have alerts on
>>>>>>>>>>>>>>>>> benchmarks. Do we have
such an ability today?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>> https://apache-beam-testing.appspot.com/dashboard-admin
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Sep 5, 2019 at
3:15 PM Thomas Weise <
>>>>>>>>>>>>>>>>> thw@apache.org> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Are there any performance
tests run for the Python SDK as
>>>>>>>>>>>>>>>>>> part of release verification
(or otherwise as well)?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I see what appears
to be a regression in master (compared
>>>>>>>>>>>>>>>>>> to 2.14) with our
in-house application (~ 25% jump in cpu utilization and
>>>>>>>>>>>>>>>>>> corresponds drop
in throughput).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I wanted to see if
there is anything available to verify
>>>>>>>>>>>>>>>>>> that within Beam.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Thomas
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>

Mime
View raw message