arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antoine Pitrou <anto...@python.org>
Subject Re: [DISCUSS][C++] Rethinking our current C++ shared library (.so / .dll) approach
Date Tue, 17 Sep 2019 08:12:20 GMT

For the record, the concrete issue which sparked this discussion
received an elegant fix from Benjamin:
https://github.com/apache/arrow/pull/5391

Regards

Antoine.


Le 17/09/2019 à 04:34, Sutou Kouhei a écrit :
> Hi,
> 
> If this is circular, it's a problem. But this isn't circular
> for now.
> 
> I think that we can use libarrow as the fundamental shared
> library to provide common implementation like [1] if we need
> to provide common implementation for template. (I think that
> we don't provide common implementation for template.)
> 
> [1] https://github.com/apache/arrow/pull/5221/commits/e88b2579f04451d741eeddcb6697914bcc1019a6
> 
> Anyway, I'm not strongly oppose to this idea. If we choose
> one shared library approach, Linux packages, GLib bindings
> and Ruby bindings can follow the change.
> 
> 
> Thanks,
> --
> kou
> 
> In <CAJPUwMDWENCjPBw+HrSWAOJFez7e_yci-Fg2D3LwgVNCf45iWQ@mail.gmail.com>
>   "Re: [DISCUSS][C++] Rethinking our current C++ shared library (.so / .dll) approach"
on Thu, 12 Sep 2019 13:23:01 -0500,
>   Wes McKinney <wesmckinn@gmail.com> wrote:
> 
>> One thing I forgot to mention:
>>
>> One of the things driving the creation of new shared libraries is
>> interdependencies. For example:
>>
>> libarrow -> libparquet
>> libarrow -> libarrow_dataset
>> libparquet -> libarrow_dataset
>>
>> With the modular LLVM-like approach this issue goes away.
>>
>> On Thu, Sep 12, 2019 at 1:16 PM Wes McKinney <wesmckinn@gmail.com> wrote:
>>>
>>> I forgot to add the link to the LLVM library listing
>>>
>>> https://gist.github.com/wesm/d13c2844db0c19477e8ee5c95e36a0dc
>>>
>>> On Thu, Sep 12, 2019 at 1:14 PM Wes McKinney <wesmckinn@gmail.com> wrote:
>>>>
>>>> hi folks,
>>>>
>>>> I wanted to share some concerns that I have about our current
>>>> trajectory with regards to producing shared libraries from the Arrow
>>>> build system.
>>>>
>>>> Currently, a comprehensive build produces many shared libraries:
>>>>
>>>> * libarrow
>>>> * libarrow_dataset
>>>> * libarrow_flight
>>>> * libarrow_python
>>>> * libgandiva
>>>> * libparquet
>>>> * libplasma
>>>>
>>>> There are some others. There are a number of problems with the current approach:
>>>>
>>>> * Each DLL needs its own set of "visibility" macros to control the use
>>>> of __declspec(dllimport/dllexport) on Windows, which is necessary to
>>>> instruct the import or export of symbols between DLLs on Windows. See
>>>> e.g. https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/visibility.h
>>>>
>>>> * Templates instantiated in one DLL may cause a violation of the One
>>>> Definition Rule during linking (we lost at least a day of work time
>>>> collectively to issues around this in ARROW-6244). It is good to be
>>>> able to share common template interfaces in general
>>>>
>>>> * Statically-linked dependencies in one shared lib may need to be
>>>> statically linked into another library. For example, libgandiva
>>>> statically links parts of LLVM, but we will likely have some other
>>>> code that makes use of LLVM for other purposes (it has been discussed
>>>> in the context of Avro parsing)
>>>>
>>>> Overall, my preferred solution to these issues is to move to a similar
>>>> approach to what the LLVM project does. To help understand, let me
>>>> have you first look at the libraries that come from the llvm-7-dev
>>>> package on Ubuntu
>>>>
>>>> Here we have a collection of static "module" libraries that implement
>>>> different parts of the LLVM platform. Finally, a _single_ shared
>>>> library libLLVM-7.so is produced.
>>>>
>>>> I think we should do the same thing in Apache Arrow. So we only ever
>>>> will produce a single shared library from the build. We can
>>>> additionally make the "name" of this shared library configurable to
>>>> suit different needs. For example, the default name could be simply
>>>> "libarrow.so" or something. But if someone wants to produce a
>>>> barebones Parquet shared library they can override the name to create
>>>> a "libparquet.so" that contains only the "libarrow_core.a" and
>>>> "libarrow_io.a" symbols needed for reading Parquet files.
>>>>
>>>> This would have additional benefits:
>>>>
>>>> * Use the same visibility macros for all exported C++ symbols, rather
>>>> than having to define DLL-specific visibility
>>>>
>>>> * Improved modularization of builds and linking for third party users,
>>>> similar to the way that LLVM's modular linking works, see the way that
>>>> Gandiva requests specific components from LLVM to use for static
>>>> linking https://github.com/apache/arrow/blob/master/cpp/cmake_modules/FindLLVM.cmake#L53
>>>>
>>>> * Net simpler linking and deployment. Only one shared library to deal with
>>>>
>>>> There are some drawbacks, however:
>>>>
>>>> * Our C++ Linux packaging approach would need to be changed to be more
>>>> LLVM-like (a single .deb/.yum package containing the C++ platform
>>>> rather than many packages as now)
>>>>
>>>> Interested to hear from other C++ developers.
>>>>
>>>> Thanks
>>>> Wes

Mime
View raw message