From dev-return-5395-archive-asf-public=cust-asf.ponee.io@mxnet.incubator.apache.org  Mon Jan 28 10:58:49 2019
Return-Path: <dev-return-5395-archive-asf-public=cust-asf.ponee.io@mxnet.incubator.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 5F40518060E
	for <archive-asf-public@cust-asf.ponee.io>; Mon, 28 Jan 2019 10:58:49 +0100 (CET)
Received: (qmail 47609 invoked by uid 500); 28 Jan 2019 09:58:48 -0000
Mailing-List: contact dev-help@mxnet.incubator.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:dev-help@mxnet.incubator.apache.org>
List-Unsubscribe: <mailto:dev-unsubscribe@mxnet.incubator.apache.org>
List-Post: <mailto:dev@mxnet.incubator.apache.org>
List-Id: <dev.mxnet.incubator.apache.org>
Reply-To: dev@mxnet.incubator.apache.org
Delivered-To: mailing list dev@mxnet.incubator.apache.org
Received: (qmail 47568 invoked by uid 99); 28 Jan 2019 09:58:47 -0000
Received: from ui-eu-01.ponee.io (HELO localhost) (176.9.59.70)
    by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Jan 2019 09:58:47 +0000
x-ponymail-agent: PonyMail Composer/0.3
From: edisongustavo@gmail.com <edisongustavo@gmail.com>
To: <dev@mxnet.apache.org>
MIME-Version: 1.0
Date: Mon, 28 Jan 2019 09:58:46 -0000
x-ponymail-sender: 6978d5cf1f1a2a99545a2701ac92ad005c72afce
Subject: Re: [DISCUSS] Current Publish problems
X-Mailer: LuaSocket 3.0-rc1
Message-ID: <pony-6978d5cf1f1a2a99545a2701ac92ad005c72afce-221beb56b9b296264d8e5eba75253619e0ea3c62@dev.mxnet.apache.org>
Content-Type: text/plain; charset=utf-8
In-Reply-To: <CAHb99TzB2YUByZh-hBu828SCAPAVNmg=kgjU1V4ASM+wpskgfQ@mail.gmail.com>
References: <CAHb99TzB2YUByZh-hBu828SCAPAVNmg=kgjU1V4ASM+wpskgfQ@mail.gmail.com> <9B84AF41-F184-4170-ABCB-7612BCC7B684@live.com>

Hello all,

First let me introduce myself:

My name is Edison Gustavo Muenz. I have worked most of my career with C++, Windows and Linux. I am a big fan of machine learning and now I joined Amazon in Berlin to work on MXNet.

I would like to give some comments on the document posted:

# change publish OS (Severe)

As a rule of thumb, when providing your own binaries on linux, we should always try to compile with oldest glibc possible. Using CentOS7 for this regard (if possible due to the CUDA issues) is the way to go.

# Using Cent OS 7

> However, all of the current GPU build scripts would be unavailable since nvidia does not provide the corresponding packages for rpm. In this case, we may need to go with NVIDIA Docker for Cent OS 7 and that only provide a limited versions of CUDA.

> List of CUDA that NVIDIA supporting for Cent OS 7:
> CUDA 10, 9.2, 9.1, 9.0, 8.0, 7.5

From what I saw in the link provided (https://hub.docker.com/r/nvidia/cuda/), this list of versions is even bigger than the list of versions supported on Ubuntu 16.04.

What am I missing?

> Another problem we may see is the performance and stability difference on the backend we built since we downgrade libc from 2.19 to 2.17

I would like to first give a brief intro so that we're all on the same page. If you already know how libc versioning works, then you can skip this part

## Brief intro on how libc versioning works

In libc each symbol provided by libc has 2 components:
- symbol name
- version

This can be seen with:

```
$ objdump -T /lib/x86_64-linux-gnu/libc.so.6 | grep memcpy
00000000000bd4a0  w   DF .text  0000000000000009  GLIBC_2.2.5 wmemcpy
00000000001332f0 g    DF .text  0000000000000019  GLIBC_2.4   __wmemcpy_chk
000000000009f0e0 g   iD  .text  00000000000000ca  GLIBC_2.14  memcpy
00000000000bb460 g    DF .text  0000000000000028 (GLIBC_2.2.5) memcpy
00000000001318a0 g   iD  .text  00000000000000ca  GLIBC_2.3.4 __memcpy_chk
```

So it can be seen that there are different memory addresses for each version of memcpy.

When linking a binary, the linker will always choose the most recent version of the libc symbol.

An example:
    - your program uses the `memcpy` symbol
    - when linking, the linker will choose `memcpy` at version 2.14 (latest)

When executing the binary then the libc provided on your system must have a memcpy at version 2.14, otherwise you get the following error:

    /lib/x86_64-linux-gnu/libm.so.6: version `libc_2.23' not found (required by /tmp/mxnet6145590735071079280/libmxnet.so)

Also, a symbol has its version increased when there are breaking changes. So, libc will only increase the version of a symbol if any of its inputs/outputs changed in a non-compatible way (eg.: Changing the type of a field to a non-compatible type, like int -> short).

## Performance difference between versions 2.17 and 2.19

This website is really handy for this: https://abi-laboratory.pro/?view=timeline&l=glibc

If we look at the links:

- https://abi-laboratory.pro/index.php?view=objects_report&l=glibc&v1=2.18&v2=2.19
- https://abi-laboratory.pro/index.php?view=objects_report&l=glibc&v1=2.17&v2=2.18

You can see that their binary compatibility is fine since no significant changes were made between these versions that could compromise the performance.

Finally, I want to thank everyone for letting me part of this community.

On 2019/01/23 21:48:48, kellen sunderland <kellen.sunderland@gmail.com> wrote: 
> Hey Qing, thanks for the summary and to everyone for automating the
> deployment process.  I've left a few comments on the doc.
> 
> On Wed, Jan 23, 2019 at 11:46 AM Qing Lan <lanking520@live.com> wrote:
> 
> > Hi all,
> >
> > Recently Zach announced the availability for MXNet Maven publishing
> > pipeline and general static-build instructions. In order to make it better,
> > I drafted a document that includes the problems we have for this pipeline:
> > https://cwiki.apache.org/confluence/display/MXNET/Outstanding+problems+with+publishing.
> > Some of them may need to be addressed very soon.
> >
> > Please kindly review and leave any comments you may have in this thread or
> > in the document.
> >
> > thanks,
> > Qing
> >
> >
>