arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Krisztián Szűcs <szucs.kriszt...@gmail.com>
Subject Re: C++ and Python size problems with Arrow 0.13.0
Date Wed, 03 Apr 2019 10:05:38 GMT
This is what the wheel contains before running auditwheel:

-rwxr-xr-x  1 root root 128K Apr  3 09:02 libarrow_boost_filesystem.so
-rwxr-xr-x  1 root root 128K Apr  3 09:02
libarrow_boost_filesystem.so.1.66.0
-rwxr-xr-x  1 root root 1.2M Apr  3 09:02 libarrow_boost_regex.so
-rwxr-xr-x  1 root root 1.2M Apr  3 09:02 libarrow_boost_regex.so.1.66.0
-rwxr-xr-x  1 root root  30K Apr  3 09:02 libarrow_boost_system.so
-rwxr-xr-x  1 root root  30K Apr  3 09:02 libarrow_boost_system.so.1.66.0
-rwxr-xr-x  1 root root 1.4M Apr  3 09:02 libarrow_python.so
-rwxr-xr-x  1 root root 1.4M Apr  3 09:02 libarrow_python.so.14
-rwxr-xr-x  1 root root  12M Apr  3 09:02 libarrow.so
-rwxr-xr-x  1 root root  12M Apr  3 09:02 libarrow.so.14
-rw-r--r--  1 root root 6.1M Apr  3 09:02 lib.cpp
-rwxr-xr-x  1 root root 2.4M Apr  3 09:02
lib.cpython-36m-x86_64-linux-gnu.so
-rwxr-xr-x  1 root root  55M Apr  3 09:02 libgandiva.so
-rwxr-xr-x  1 root root  55M Apr  3 09:02 libgandiva.so.14
-rwxr-xr-x  1 root root 2.9M Apr  3 09:02 libparquet.so
-rwxr-xr-x  1 root root 2.9M Apr  3 09:02 libparquet.so.14
-rwxr-xr-x  1 root root 309K Apr  3 09:02 libplasma.so
-rwxr-xr-x  1 root root 309K Apr  3 09:02 libplasma.so.14

After running auditwheel, the repaired wheel contains:

-rwxr-xr-x  1 root root 128K Apr  3 09:02 libarrow_boost_filesystem.so
-rwxr-xr-x  1 root root 128K Apr  3 09:02
libarrow_boost_filesystem.so.1.66.0
-rwxr-xr-x  1 root root 1.2M Apr  3 09:02 libarrow_boost_regex.so
-rwxr-xr-x  1 root root 1.2M Apr  3 09:02 libarrow_boost_regex.so.1.66.0
-rwxr-xr-x  1 root root  30K Apr  3 09:02 libarrow_boost_system.so
-rwxr-xr-x  1 root root  30K Apr  3 09:02 libarrow_boost_system.so.1.66.0
-rwxr-xr-x  1 root root 1.6M Apr  3 09:55 libarrow_python.so
-rwxr-xr-x  1 root root 1.4M Apr  3 09:02 libarrow_python.so.14
-rwxr-xr-x  1 root root  12M Apr  3 09:55 libarrow.so
-rwxr-xr-x  1 root root  12M Apr  3 09:02 libarrow.so.14
-rw-r--r--  1 root root 6.1M Apr  3 09:02 lib.cpp
-rwxr-xr-x  1 root root 2.5M Apr  3 09:55
lib.cpython-36m-x86_64-linux-gnu.so
-rwxr-xr-x  1 root root  59M Apr  3 09:55 libgandiva.so
-rwxr-xr-x  1 root root  55M Apr  3 09:02 libgandiva.so.14
-rwxr-xr-x  1 root root 3.5M Apr  3 09:55 libparquet.so
-rwxr-xr-x  1 root root 2.9M Apr  3 09:02 libparquet.so.14
-rwxr-xr-x  1 root root 345K Apr  3 09:55 libplasma.so
-rwxr-xr-x  1 root root 309K Apr  3 09:02 libplasma.so.14

Here is the output of auditwheel
https://travis-ci.org/kszucs/crossbow/builds/514605723#L3340

On Wed, Apr 3, 2019 at 10:36 AM Antoine Pitrou <antoine@python.org> wrote:

>
> Le 03/04/2019 à 02:23, Wes McKinney a écrit :
> >
> > $ ll Library/lib/
> > total 741796
> > -rw-r--r-- 1 wesm wesm   1507048 Mar 27 23:34 arrow.lib
> > -rw-r--r-- 1 wesm wesm     76184 Mar 27 23:35 arrow_python.lib
> > -rw-r--r-- 1 wesm wesm  61322082 Mar 27 23:36 arrow_python_static.lib
> > -rw-r--r-- 1 wesm wesm 328090044 Mar 27 23:37 arrow_static.lib
> > drwxr-xr-x 3 wesm wesm      4096 Apr  2 19:12 cmake/
> > -rw-r--r-- 1 wesm wesm    302496 Mar 27 23:38 gandiva.lib
> > -rw-r--r-- 1 wesm wesm 239314018 Mar 27 23:40 gandiva_static.lib
> > -rw-r--r-- 1 wesm wesm    491292 Mar 27 23:41 parquet.lib
> > -rw-r--r-- 1 wesm wesm 128473780 Mar 27 23:42 parquet_static.lib
> > drwxr-xr-x 2 wesm wesm      4096 Apr  2 19:12 pkgconfig/
> >
> > As a mitigating measure in the meantime, I would suggest that we stop
> > bundling the static libraries in the arrow-cpp conda package, since
> > we're just hurting release managers and users with a large package
> > download when they `conda install pyarrow`.
>
> Agreed.
>
> > Can someone open a JIRA
> > issue about this?
>
> See https://issues.apache.org/jira/browse/ARROW-5101
>
> > There's something very odd here, though, which is that libgandiva.so
> > and libgandiva.so.13 appear to be distinct.
>
> Not only.  libparquet.so, libplasma.so and libarrow.so are distinct as
> well.  This means that we may be building those libraries twice instead
> of copying the files.
>
> By the way, I don't understand why those are not symlinks.
>
Me neither, but I guess setup.py bdist_wheel doesn't support symlinks.

>
> > That seems buggy to me. We might also investigate if there's a way to
> > trim the binary sizes in some way.
>
> Well, there's always "strip -s", but it doesn't seem to remove much
> (libgandiva.so shrinks from 60 to 50 MB, and you lose all debug
> information).
>
> One issue seems to be that libgandiva.so links LLVM statically, but
> doesn't hide LLVM symbols.  That said, libllvmlite.so (which hides LLVM
> symbols) has grown quite large recently as well (around 40 MB).
>
> Perhaps Gandiva needs to be packaged separately...
>
> Regards
>
> Antoine.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message