mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Skalicky, Sam" <sska...@amazon.com.INVALID>
Subject Re: Stopping nightly releases to Pypi
Date Sun, 05 Jan 2020 18:09:15 GMT
Hi Haibin,

You typed the correct URLs, the cu100 build has been failing since December 30th but other
builds have succeeded. The wheels are being delivered into a public bucket that anyone with
an AWS account can access and go poke around, here’s the URL for web access:

https://s3.console.aws.amazon.com/s3/buckets/apache-mxnet/dist/2020-01-01/dist/?region=us-west-2&tab=overview

You will have to log into your AWS account to access it however (which means you’ll need
an AWS account).

It looks like only the following flavors are available for 2020-01-01:
mxnet
mxnet-cu92
mxnet-cu92mkl
mxnet-mkl

Sam

On Jan 4, 2020, at 9:06 PM, Haibin Lin <haibin.lin.aws@gmail.com<mailto:haibin.lin.aws@gmail.com>>
wrote:

I was trying the nightly builds, but none of them is available:

pip3 install
https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2020-01-01/dist/mxnet_cu100-1.6.0b20200101-py2.py3-none-manylinux1_x86_64.whl
--user
pip3 install
https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2020-01-02/dist/mxnet_cu100-1.6.0b20200102-py2.py3-none-manylinux1_x86_64.whl
--user
pip3 install
https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2020-01-03/dist/mxnet_cu100-1.6.0b20200103-py2.py3-none-manylinux1_x86_64.whl
--user
pip3 install
https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2020-01-04/dist/mxnet_cu100-1.6.0b20200104-py2.py3-none-manylinux1_x86_64.whl
--user

ERROR: Could not install requirement mxnet-cu100==1.6.0b20200103 from
https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2020-01-03/dist/mxnet_cu100-1.6.0b20200103-py2.py3-none-manylinux1_x86_64.whl
because of HTTP error 404 Client Error: Not Found for url:
https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2020-01-03/dist/mxnet_cu100-1.6.0b20200103-py2.py3-none-manylinux1_x86_64.whl
for URL
https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2020-01-03/dist/mxnet_cu100-1.6.0b20200103-py2.py3-none-manylinux1_x86_64.whl

Please let me know if I typed wrong URLs.

1. The discoverability of available nightly builds needs improvement. If
someone can help write a script to list all links that exist, that would be
very helpful.
2. If any nightly build is not built successfully, how do the community
know the reason of the failure, and potentially offer helps? Currently I
don't have much visibility of the nightly build status.

Best,
Haibin


On Fri, Jan 3, 2020 at 5:47 PM Pedro Larroy <pedro.larroy.lists@gmail.com>
wrote:

Just to clarify, the current CI is quite an overhead to maintain for
several reasons, this complexity is overkill for CD. Jenkins also has
constant plugin upgrades, security vulnerabilities, has to be restarted
from time to time as it stops working... and to make binary builds from an
environment which runs unsafe code, I don't think is good practice. So for
that, having a separate Jenkins, CodeBuild, Drone or using a separate
Jenkins node is the right solution. Agree with you that is just a
scheduler, but somebody is making efforts to keep it running. If you have
the appetite and resources to duplicate it for CD please go ahead.

On Fri, Jan 3, 2020 at 3:25 PM Marco de Abreu <marco.g.abreu@gmail.com>
wrote:

Regarding your point of finding somebody to maintain the solution: At
Apache we usually retire things if there's no maintainer, since that
indicates that the feature/system is not of enough interest to warrant
maintenance - otherwise, someone would step up.

While assistance in the form of a fix is always appreciated, the fix
still
has to conform with the way this project and Apache operates. Next time
I'd
recommend to contribute time on improving the existing community solution
instead of developing an internal system.

-Marco

Marco de Abreu <marco.g.abreu@gmail.com> schrieb am Sa., 4. Jan. 2020,
00:21:

Sam, while I understand that this solution was developed out of
necessity,
my question why a new system has been developed instead of fixing the
existing one or adapting the solution. CodeBuild is a scheduler in the
same
fashion as Jenkins is. It runs code. So you can adapt it to Jenkins
without
much hassle.

I'm not volunteering for this - why should I? The role of a PMC member
is
to steer the direction of the project. Just because a manager points
towards a certain direction, if doesn't mean that they're going to do
it.

Apparently there was enough time at some point to develop a new
solution
from scratch. It might have been a solution for your internal team and
that's fine, but upgrading it "temporarily" to be the advertised way on
the
official website is something different.

I won't argue about how the veto can be enforced. I think it's in the
best
interest of the project if we try working on a solution instead of
spending
time on trying to figure out the power of the PMC.

Pedro, that's certainly a step towards the right direction. But
committers
would also need access to the control plane of the system - to trigger,
stop and audit builds. We could go down that road, but i think the
fewer
systems, the better - also for the sake of maintainability.

Best regards,
Marco



Pedro Larroy <pedro.larroy.lists@gmail.com> schrieb am Fr., 3. Jan.
2020,
20:55:

I'm not involved in such efforts, but one possibility is to have the
yaml
files that describe the pipelines for CD in the Apache repositories,
would
that be acceptable from the Apache POV? In the end they should be very
thin
and calling the scripts that are part of the CD packages.

On Fri, Jan 3, 2020 at 6:56 AM Marco de Abreu <
marco.g.abreu@gmail.com>
wrote:

Agree, but the question how a non Amazonian is able to maintain and
access
the system is still open. As it stands right now, the community has
taken a
step back and loses some control if we continue down that road.

I personally am disapproving of that approach since committers are
no
longer in control of that process. So far it seems like my questions
were
skipped and further actions have been taken. As openness and the
community
having control are part of our graduation criteria, I'm putting in
my
veto
with a grace period until 15th of January. Please bring the system
into
a
state that aligns with Apache values or revert the changes.

-Marco

Pedro Larroy <pedro.larroy.lists@gmail.com> schrieb am Fr., 3. Jan.
2020,
03:33:

CD should be separate from CI for security reasons in any case.


On Sat, Dec 7, 2019 at 10:04 AM Marco de Abreu <
marco.g.abreu@gmail.com>
wrote:

Could you elaborate how a non-Amazonian is able to access,
maintain
and
review the CodeBuild pipeline? How come we've diverted from the
community
agreed-on standard where the public Jenkins serves for the
purpose
of
testing and releasing MXNet? I'd be curious about the issues
you're
encountering with Jenkins CI that led to a non-standard
solution.

-Marco


Skalicky, Sam <sskalic@amazon.com.invalid> schrieb am Sa., 7.
Dez.
2019,
18:39:

Hi MXNet Community,

We have been working on getting nightly builds fixed and made
available
again. We’ve made another system using AWS CodeBuild & S3 to
work
around
the problems with Jenkins CI, PyPI, etc. It is currently
building
all
the
flavors and publishing to an S3 bucket here:







https://us-west-2.console.aws.amazon.com/s3/buckets/apache-mxnet/dist/?region=us-west-2

There are folders for each set of nightly builds, try out the
wheels
starting today 2019-12-07. Builds start at 1:30am PT (9:30am
GMT)
and
arrive in the bucket 30min-2hours later. Inside each folder
are
the
wheels
for each flavor of MXNet. Currently we’re only building for
linux,
builds
for windows/Mac will come later.

If you want to download the wheels easily you can use a URL in
the
form
of:
https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/






<YYYY-MM-DD>/dist/<mxnet_build>-1.6.0b<YYYYMMDD>-py2.py3-none-manylinux1_x86_64.whl

Heres a set of links for today’s builds

(Plain mxnet, no mkl no cuda)







https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
(mxnet-mkl
<





https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl(mxnet-mkl

)







https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
(mxnet-cuXXX
<





https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl(mxnet-cuXXX

)







https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu90-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl







https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu92-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl







https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu100-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl







https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu101-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
(mxnet-cuXXXmkl
<





https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu101-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl(mxnet-cuXXXmkl

)







https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu90mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl







https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu92mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl







https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu100mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl







https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu101mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl

You can easily install these pip wheels in your system either
by
downloading them to your machine first and then installing by
doing:

pip install /path/to/downloaded/wheel.whl

Or you can install directly by just giving the link to pip
like
this:

pip install






https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl

Credit goes to everyone involved (in no particular order)
Rakesh Vasudevan
Zach Kimberg
Manu Seth
Sheng Zha
Jun Wu
Pedro Larroy
Chaitanya Bapat

Thanks!
Sam


On Dec 5, 2019, at 1:16 AM, Lausen, Leonard
<lausen@amazon.com.INVALID
<mailto:lausen@amazon.com.INVALID>> wrote:

We don't loose pip by hosting on S3. We just don't host
nightly
releases
on Pypi
servers and mirror them to several hundred mirrors immediately
after
each
build
is published which is very expensive for the Pypi project..
People
can
still
install the nightly builds with pip by specifying the -f
option.

Uploading weekly releases to Pypi will reduce the cost for
Pypi
by
~75%
[1]. It
may be acceptable to Pypi, but does it make sense for us? I'm
not
convinced
weekly release on Pypi is a good idea. Consider one release is
buggy,
users will
need to wait for 7 days for a fix. It doesn't provide good
user
experience.
If someone has a stronger conviction about the value of weekly
releases
on
Pypi,
that person shall please go ahead and propose it in a separate
discussion
thread.

Currently we don't have generally working nightly builds to
Pypi
and
as a
matter
of fact we know that we can't have them due to Pypi's policy
and
our
apparent
need for large binaries. Given this fact and that no objection
was
raised
by
2019-12-05 at 05:42 UTC, I conclude we have lazy consensus on
stopping
upload
attempts of nightly builds to Pypi.

With consensus established, we can change the CI job to stop
trying
to
upload
the nightly builds and then request Pypi to increase the
limit.
Then
we
have one
less blocker for the 1.6 release.

Best regards
Leonard

[1]: Lower cost due to less releases, but higher cost due to
500MB ->
800MB
limit increase. Assuming that the limit increase translates
into
actually
larger
binaries.


On Wed, 2019-12-04 at 22:20 +0100, Marco de Abreu wrote:
Are weekly releases an option? It was brought up as concern
that
we
might
lose pip as a pretty common distribution channel where people
consume
nightly builds. I don't feel like that concern has been
properly
addressed
so far.

-Marco

Lausen, Leonard <lausen@amazon.com.invalid<mailto:
lausen@amazon.com.invalid>> schrieb am Mi., 4. Dez. 2019,
04:09:

As a simple POC to test distribution, you can try installing
MXNet
based
on
these 3 URLs:

pip install --no-cache-dir








https://mxnet-dev.s3.amazonaws.com/mxnet_cu101-1.5.1.post0-py2.py3-none-manylinux1_x86_64.whl
pip install --no-cache-dir








https://mxnet-dev.s3-accelerate.amazonaws.com/mxnet_cu101-1.5.1.post0-py2.py3-none-manylinux1_x86_64.whl
pip install --no-cache-dir
https://d19zq12jzu4w95.cloudfront.net/
mxnet_cu101-1.5.1.post0-py2.py3-none-manylinux1_x86_64.whl
<





https://d19zq12jzu4w95.cloudfront.net/mxnet_cu101-1.5.1.post0-py2.py3-none-manylinux1_x86_64.whl

<







https://d19zq12jzu4w95.cloudfront.net/mxnet_cu101-1.5.1.post0-py2.py3-none-manylinux1_x86_64.whl


where --no-cache-dir prevents caching the downloaded file, for
the
purpose
of
testing. (cu101 chosen based on large size)

The first URL uses standard S3 bucket in US. The second uses
S3
Accelerate
based
on CloudFront CDN. And the third uses CloudFront CDN. I'm
adding
the
third
URL,
as S3 Accelerate may or may not use all new CloudFront
endpoints
yet.

Regarding voting: Uploading to Pypi is currently impossible,
which
is a
reality
(so there is no option to continue as we do currently). Pypi
folks
indicated
they will unblock our uploads to Pypi once we stop uploading
nightly
releases
and taking up 20% of their ressources [1].

If there are any shortcomings or problems identified with
uploading
to
S3,
we
can work to address them. But for now, status quo is broken
and
this
seems
the
only solution addressing Pypi's problem.

I don't mind if you state that you object to lazy consensus
and
start a
vote. If
your "maybe [...] start a proper vote" was supposed to be an
objection
to
lazy
consensus, please state so clearly (I'm not sure if "maybe"
qualifies
as
objection). Though I think it only makes sense with at least 2
options
to
vote
on. Status quo is not a meaningful option, as it is already
broken.

Best regards
Leonard

[1]:

https://github.com/pypa/pypi-support/issues/50#issuecomment-560479706

On Tue, 2019-12-03 at 19:28 +0100, Marco de Abreu wrote:
Excellent! Could we maybe come up with a POC and a quick
writeup
and
then
start a proper vote after everyone verified that it covers
their
use-cases?
-Marco

Sheng Zha <zhasheng@apache.org> schrieb am Di., 3. Dez. 2019,
19:24:

Yes, there is. We can also make it easier to access by using a
geo-location based DNS server so that China users are directed
to
that
local mirror. The rest of the world is already covered by the
global
cloudfront.

-sz

On 2019/12/03 18:22:22, Marco de Abreu <
marco.g.abreu@gmail.com

wrote:
Isn't there an s3 endpoint in Beijing?

It seems like this topic still warrants some discussion and
thus
I'd

prefer
if we don't move forward with lazy consensus.

-Marco

Tao Lv <mutouorz@gmail.com> schrieb am Di., 3. Dez. 2019,
14:31:

* For pypi, we can use mirrors.

On Tue, Dec 3, 2019 at 9:28 PM Tao Lv <mutouorz@gmail.com>
wrote:

As we have many users in China, I'm considering the
accessibility of
S3.
For pip, we can mirrors.

On Tue, Dec 3, 2019 at 3:24 PM Lausen, Leonard

<lausen@amazon.com.invalid
wrote:

I would like to remind everyone that lazy consensus is assumed
if no
objections
are raised before 2019-12-05 at 05:42 UTC. There has been some

discussion
about
the proposal, but to my understanding no objections were
raised.
If the proposal is accepted, MXNet releases would be installed
via
 pip install mxnet

And release candidates via

pip install --pre mxnet

(or with the respective cuda version specifier appended etc.)

To obtain releases built automatically from the master branch,
users
would need
to specify something like "-f
http://mxnet.s3.amazonaws.com/mxnet-X/nightly.html" option to
pip.
Best regards
Leonard

On Mon, 2019-12-02 at 05:42 +0000, Lausen, Leonard wrote:
Hi MXNet Community,

since more than 2 months our binary Python nightly releases

published
on Pypi
are broken. The problem is that our binaries exceed Pypi's
size
limit.
Decreasing the binary size by adding compression breaks

third-party
libraries
loading libmxnet.so

https://github.com/apache/incubator-mxnet/issues/16193
Sheng requested Pypi to increase their size limit:
https://github.com/pypa/pypi-support/issues/50

Currently "the biggest cost for PyPI from [the many MXNet
binaries
with
nightly
release to Pypi] is the bandwidth consumed when several
hundred
mirrors
attempt
to mirror each release immediately after it's published". So
Pypi
is
not
inclined to allow us to upload even larger binaries on a
nightly
schedule.
Their compromise is to allow it on a weekly cadence.

However, I would like the community to revisit the necessity
of
releasing pre-
release binaries to Pypi on a nightly (or weekly) cadence.

Instead, we
can
release nightly binaries ONLY to a public S3 bucket and
instruct
users
to
install from there. On our side, we only need to prepare a
html
document that
contains links to all released nightly binaries.
Finally users will install the nightly releases via

pip install --pre mxnet-cu101 -f

http://mxnet.s3.amazonaws.com/mxnet-cu101/
nightly.html

Instead of

pip install --pre mxnet-cu101

Of course proper releases and release candidates should
still be
made
available
via Pypi. Thus releases would be installed via

pip install mxnet-cu101

And release candidates via

pip install --pre mxnet-cu101

This will substantially reduce the costs of the Pypi project
and
in
fact
matches
the installation experience provided by PyTorch. I don't
think the
benefit of
not including "-f

http://mxnet.s3.amazonaws.com/mxnet-cu101/nightly.html"
matches the costs we currently externalize to the Pypi team.

This suggestion seems uncontroversial to me. Thus I would
like to
start
lazy
consensus. If there are no objections, I will assume lazy

consensus on
stopping
nightly releases to Pypi in 72hrs.

Best regards
Leonard










Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message