From dev-return-14697-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Tue Sep 17 08:12:31 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 64DC5180645 for ; Tue, 17 Sep 2019 10:12:31 +0200 (CEST) Received: (qmail 60592 invoked by uid 500); 17 Sep 2019 08:12:30 -0000 Mailing-List: contact dev-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@arrow.apache.org Delivered-To: mailing list dev@arrow.apache.org Received: (qmail 60568 invoked by uid 99); 17 Sep 2019 08:12:29 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Sep 2019 08:12:29 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 4CE821A4D07 for ; Tue, 17 Sep 2019 08:12:29 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -2.5 X-Spam-Level: X-Spam-Status: No, score=-2.5 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=python.org Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id JJauMadQIGFH for ; Tue, 17 Sep 2019 08:12:26 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a03:b0c0:2:d0::71:1; helo=mail.python.org; envelope-from=antoine@python.org; receiver= Received: from mail.python.org (mail.python.org [IPv6:2a03:b0c0:2:d0::71:1]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id 709977DC5D for ; Tue, 17 Sep 2019 08:12:26 +0000 (UTC) Received: from [192.168.1.98] (221-98-190-109.dsl.ovh.fr [109.190.98.221]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.python.org (Postfix) with ESMTPSA id 46XbSN5WfWzncyy for ; Tue, 17 Sep 2019 04:12:20 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=python.org; s=200901; t=1568707940; bh=gle9e9gorUb8+c/x5ShlNlFVp5d4Mb8KzMTFmxkqoxY=; h=Subject:To:References:From:Date:In-Reply-To:From; b=NBNvci9gYXq5UDdJAG0q8LGDA6/v8mgsZQ6CN6gO1i+QJCORlZBumlVLIfYwIJ2GZ 37danDyMsIGGkChyN858c14TH2dMDRIP5Oj3c04h7G/oN6fqvPzLr79rMPyZxj+rd7 WXCdVqHuvIIZ748tcI81DAJYlqvJzxHW3+l5U7Qs= Subject: Re: [DISCUSS][C++] Rethinking our current C++ shared library (.so / .dll) approach To: dev@arrow.apache.org References: <20190917.113446.1393729608527228924.kou@clear-code.com> From: Antoine Pitrou Openpgp: preference=signencrypt Autocrypt: addr=antoine@python.org; prefer-encrypt=mutual; keydata= mQINBFQIqH8BEADUlB6Q7oEmm535PJ8ZebpN0buM4zFEHDMOukMfuoz9bBN0rVvvYRfXv9ID EYR1cHcie8oMudeXgHpZJ7M6KJPrHDOeR66dw+M5BYUhy1dJGaKSNYST9iXHuRrS21yhbBaG 7JhAuTE/qDiDNztu9q94Kw4vkrK8xuoAy9fQWIfSPPhQHFctA/NlTOC5CcRaWE7MQWU3XgH8 5VcaO0I7Ri7C2shkzGZAuns3owlmRSlkS1sMtnh2UEl2QBy2ckLGjaNB6aSlqnfOnwE3iodR nScgkAv7hvV/DePO/xNZQjWYynRZLdgCj+UQd5UGd/gTv0M0lqOCNsdkVDPA6VSvR8z0x8Sr MwpKXwz0sEeISeoY64EBVx9AhA/p6NaE8cLi7XCQI9iOCe6FWj89FpBfLx853glZOWlO5G/F nOYycB/zWLyGcRG5M1jVOsvccthQzeLKOqRZQ+J5ohZ5czM6xMcq1wm1a3SgdrIS/RsmCQWg 3EgZQDgNttJC0wDPcd5PmwXSJ23lDfiJ6xoUdwrhkkgdlQLDVLxsVP90E4iEZeSOIwzCIGTu mmYx9R83BomN8S9qj2ZfRXomYDGpYI5CSs08MClTPdSbA+3alviu4cqC/4eeagE4U5UZwdI7 0HKjPZD4Y0bFHttr6w5tZ60IEhEbboZMMIM5Q80Iv2nZ45biYwARAQABtCVrZXliYXNlLmlv L3BpdHJvdSA8cGl0cm91QGtleWJhc2UuaW8+iQItBBMBCgAXBQJUCKh/AhsvAwsJBwMVCggC HgECF4AACgkQdpvcISnCZYy1dw//VNDTsAuja/JoFiypTqXsVS9WBJjaZ3Vcu5x6ydn2MVms JUOnTmcXMsJllN0xj+YBlNJheHOK/luh5koh3RPevV/agKUJjtqvND9vqtEH7JsnIeIbHNFa 1QwiEN8fN/OCxWRBJ2CR+igPqsZEuy1cYg3IHRsKoHSBYGRIcvO5//pqp7ZxTaLyhHzUYrMZ fhCZ5Vc6TlIoKMrtTDhkR+anCZ6SBn/GwE2O8O+METPQBhkwbtlm/Qyzw0MbvfJiyKKfb0zR PmB+K0Ah0auauPWhda+1+b09h3sNnuQFDoGndwqURbJvGXG5fisQSPHwrWgU9cHbRjo/QjSY 0tH73WJkFRFo7NB5uvvjJSW3upX9qnjnxySfnzG6vAtiloxZ5cvcgZjQMldErslQ/eC7Nc8T KqM0Ku1wcHPeKWlYjxsV+U4Ae0MTKm5r02zmPZyAmS/FlvzKtuAO4KeDF8UxphiwUeKrSQ3i MQc17bxFJkwIIBDUgm7S7XFlSKS5tWBeSLckHwu6F57lKGENlJoJE31Tfo4EM97yd4nqFkzn nMrq54pnMSECUdZh7W0zIph69X/7L3D8AoD4AxiVNE/EzkZQ5B5m8vac2eQK8eHxzcSu1MtV OlqHvXqU3LpXMV5iuNfnYw0M/FVZVPSliolGTByNe+m/GviZHXOo1/hC2SmuAyu5AQ0EVAio fwEIANgh7945oQ5pT6Idaq6MMe+6cgnrxHirdOFbOhELqQEh1uLtFVtgzxf8iEzAbZgKVwSC Cl7lpvHMTLIib1q4EPLYBBTREHe3PfJ7aszxOQTVYJ0VD6752VUTd9jr0ueALbbki8zUNoRP 8iFocMnlKMrdS0A6iAQyk6JUoCHGjsL3uwNUvIeNshhinawLgiIbC1v+Vwyp2JqI2j1MZUfy ekS6TSYpESvWmsYum2w1xXctP6bVDTN1IL0ANZ1w/5h1+YpZBFN4syf8bRvGj89VrFLyQ2Gj GF7qqpPF8wOfkvXndNeyP1BL9SNE2JXuS58K+yyvadXmuDUl67evYHUsWPEAEQEAAYkDRAQY AQoADwUCVAiofwUJDwmcAAIbAgEpCRB2m9whKcJljMBdIAQZAQoABgUCVAiofwAKCRBDM7/d tpZBeIX7CACio47vOUTVMojsOcpmdlZReSsrjeOBnCCACheYV+R/ZQCjVevu1vO50dTG/Wsg RYYEXkEzcmXTpTbltmIhtzpT/66lNcgrIVCE5ln/Zm+OBlpcUDpYawL662JePo1TzUnrfRlo TwC3ahM/RqGbLXLFP35QxjyY1261WR8KMZN4/JqwSzirIjfMF7h27p+lvQJXG33kD3SkDwbZ 5tuSbvrsNiry+uaAlvrJEaQkb1AtB5e6IBRRFwNmaD0ltv6ohDpN0nOV+RUFCE2rB1VomvpK er4AvQGFforVPy8O6N7+ypqEmZk0FUFgp2nZ2qqyuPybactqmH6WTNhXY6bCddVPJEoP/1hT Yn9PSynGPb8t87D3YIR4FFyKhJf6D3kYgSTfc99lzlcCe9TwCGtoux1jhwDKS/u7HMaTJfSd grAfIpi9txnNFh/2gHNa23bYR+VFBlWqW/ItJG3+r38vbtAeA0ZhryydWxyI7M4xAXSHiiqP MGezgk+9jv9lpWft40Zwii4LMIiFyT5kad+Bvqn4LjvNx/8NS6o7mL80IvTsbmSV48Dge4tF ZiBRQAujHIWLr+NEMbScwipce+H3iFDDS8hpHuUwlGGM42spXDd8PtecUpk3aCTYinuZ6VSE pIRK5SlNm7hTPQTxptUByCBm4Et4obVvcuK9hQ88RTz+QIFc3p7G4Se162zX8klmtllXWGhy xbHbqtntteMyRvcC7hxSTYYR3mbw8QaqsOz+eKSbbg9+Q7briZ7dcgf1DuvUxibQYhqtycy2 Ozc44nmCxlzV22/vWswD2r/TkCQu/wtTm/ZNrc40G3cRSgu3ewlCl3E5hnzWmzMB/mjd5gMq blHJxdOB4u2w6KG9w1P0oQ1TMls5SsG6Ev6Ja5huEczptG97LEbWw62gUR/mcQ/bgBYOTwBY GkFXqcf9h56zETpUNw60KGl3meA3ZEQ3bdQgwPyOnPS2EcAVLq7hDoFbz8WiiImmNZMYdPSr XoYKGkTLiTxw5xtgHmHUPXtmqNJfrcbpuQENBFQIqH8BCACxfvvYStbXDSYVEK31kE7J3vpF J0TQ9V1/rYJLS5ji/gF1pb1dOYJW9oJuy2JnpsCcdfiDtaQnF7PJdyyBrNVrD4uTk+5/ynP8 +cFLjtNmK/Drd1Z7XNbVYw88Y+2EgFSZAGxROVxHUDceT0TtKfkisjK2vFJgR2ycJQH5gGek rIRMg4Cyl4SOzShF4p9RFVni42ZKCn42Q/7uR18ph0ZTWveW4pNC0vxy/XeUCXXillchSHWe RxNy35ZkDpzjpyHmJn5xaHiMUOqL0PyIxwxMIr3wuc+2Cl3r969vrTvqmkOVIUFLJReMLcCp ZhJYBXwrr3G/C7EdjTlW475c3eNjABEBAAGJA0QEGAEKAA8FAlQIqH8FCQ8JnAACGwwBKQkQ dpvcISnCZYzAXSAEGQEKAAYFAlQIqH8ACgkQs0Q9SUv4ymSxVgf/aVTRjVcW0Tahmm1cFm3y Lvk9zOkGmGdZTxGteQApUwucgM7KKYu0S0LRMcLqMmimZU6G48DMJa0N9sXzIp6LbliG7GBF QvZ2QPMBuBKHm5JiZwQ4CAjdm5/hiwJDA9PTAnxl0gF4DAnMl+sktofS4843AvASwdbx0A1q mAev+zVqdC3XznFYaSv6a0qxMJPSzMlEuq8/gjgBtbKwswuirrZp4ffFApc7lVrYcaRQh0j4 Xu5T/Q0Eb3v8XD2xBkPJppl3MEWq6loJBnrGyN8pT5rpPkWY0FQdkGBkYKMNz22iis9kQu9H yCcwrgufAJHVQ0RgcHn9Gs+yVHbttUodyTFCD/4xJcPcNmmfZYAx4El7Ob6IVpsa7O57Mljp 3MgKWxi+/s5/qNDDc6mTMJ0H9snp4DJEqLFTMIGN9sO+oa2c7CFyiB/jaie/hMdH7v8LeDxg Wq2wV7mNPTpKzX2dCTbKOU6DGAHauVqyzrF7osqH4czJNppv4e1/U3k7cjR5ui1i/zI4DxLr QHGavyJ6F/DGQXeDv8RizB2OV3qWXzSkwhqfVCadGqVnYR0ONUSjk+MnsmVPa1K7+x1WzRUV 90Uw29naj1KgLjoAtLicgRBsk95TGCRTLfqivqq5XtTQgi1L3OlCpRNym9RP63sSFg4u2CHD J550+/lZ1JKAX0+4T5bxjNQvKJPy9+lP+pgBBV3dVYZqU/6g5JVPT/2M0yZRMSWC/9fEI6xq CPb/5REu95qfm2p/qIAoN2JnXiF6aITdS3JNkY7tYfXo2WnCE8O3pWOrbbfTtwKVLccZH7So fj46U5ZtUfZoa4EuI9LdkqRg3N6npT/yP6ij+w7ti/dYgCP6tmRQExSC4YQt6V7SKEyHuW7m rkHWEg1/ldxyreuKDq37Pm6HiapYItnoXwQhoFNOr1vEqhPgABYFJw1ZB+2vn99sKIKlSUtk 7lYOVdexznPIkEibye/+oBVGs1KkawVT58d7UzH6C/l3BI/6narZBtNe84BR0briZf3euDMZ bA== Message-ID: <4877f904-a96e-f5bc-89df-1bb4fed140d3@python.org> Date: Tue, 17 Sep 2019 10:12:20 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <20190917.113446.1393729608527228924.kou@clear-code.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit For the record, the concrete issue which sparked this discussion received an elegant fix from Benjamin: https://github.com/apache/arrow/pull/5391 Regards Antoine. Le 17/09/2019 à 04:34, Sutou Kouhei a écrit : > Hi, > > If this is circular, it's a problem. But this isn't circular > for now. > > I think that we can use libarrow as the fundamental shared > library to provide common implementation like [1] if we need > to provide common implementation for template. (I think that > we don't provide common implementation for template.) > > [1] https://github.com/apache/arrow/pull/5221/commits/e88b2579f04451d741eeddcb6697914bcc1019a6 > > Anyway, I'm not strongly oppose to this idea. If we choose > one shared library approach, Linux packages, GLib bindings > and Ruby bindings can follow the change. > > > Thanks, > -- > kou > > In > "Re: [DISCUSS][C++] Rethinking our current C++ shared library (.so / .dll) approach" on Thu, 12 Sep 2019 13:23:01 -0500, > Wes McKinney wrote: > >> One thing I forgot to mention: >> >> One of the things driving the creation of new shared libraries is >> interdependencies. For example: >> >> libarrow -> libparquet >> libarrow -> libarrow_dataset >> libparquet -> libarrow_dataset >> >> With the modular LLVM-like approach this issue goes away. >> >> On Thu, Sep 12, 2019 at 1:16 PM Wes McKinney wrote: >>> >>> I forgot to add the link to the LLVM library listing >>> >>> https://gist.github.com/wesm/d13c2844db0c19477e8ee5c95e36a0dc >>> >>> On Thu, Sep 12, 2019 at 1:14 PM Wes McKinney wrote: >>>> >>>> hi folks, >>>> >>>> I wanted to share some concerns that I have about our current >>>> trajectory with regards to producing shared libraries from the Arrow >>>> build system. >>>> >>>> Currently, a comprehensive build produces many shared libraries: >>>> >>>> * libarrow >>>> * libarrow_dataset >>>> * libarrow_flight >>>> * libarrow_python >>>> * libgandiva >>>> * libparquet >>>> * libplasma >>>> >>>> There are some others. There are a number of problems with the current approach: >>>> >>>> * Each DLL needs its own set of "visibility" macros to control the use >>>> of __declspec(dllimport/dllexport) on Windows, which is necessary to >>>> instruct the import or export of symbols between DLLs on Windows. See >>>> e.g. https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/visibility.h >>>> >>>> * Templates instantiated in one DLL may cause a violation of the One >>>> Definition Rule during linking (we lost at least a day of work time >>>> collectively to issues around this in ARROW-6244). It is good to be >>>> able to share common template interfaces in general >>>> >>>> * Statically-linked dependencies in one shared lib may need to be >>>> statically linked into another library. For example, libgandiva >>>> statically links parts of LLVM, but we will likely have some other >>>> code that makes use of LLVM for other purposes (it has been discussed >>>> in the context of Avro parsing) >>>> >>>> Overall, my preferred solution to these issues is to move to a similar >>>> approach to what the LLVM project does. To help understand, let me >>>> have you first look at the libraries that come from the llvm-7-dev >>>> package on Ubuntu >>>> >>>> Here we have a collection of static "module" libraries that implement >>>> different parts of the LLVM platform. Finally, a _single_ shared >>>> library libLLVM-7.so is produced. >>>> >>>> I think we should do the same thing in Apache Arrow. So we only ever >>>> will produce a single shared library from the build. We can >>>> additionally make the "name" of this shared library configurable to >>>> suit different needs. For example, the default name could be simply >>>> "libarrow.so" or something. But if someone wants to produce a >>>> barebones Parquet shared library they can override the name to create >>>> a "libparquet.so" that contains only the "libarrow_core.a" and >>>> "libarrow_io.a" symbols needed for reading Parquet files. >>>> >>>> This would have additional benefits: >>>> >>>> * Use the same visibility macros for all exported C++ symbols, rather >>>> than having to define DLL-specific visibility >>>> >>>> * Improved modularization of builds and linking for third party users, >>>> similar to the way that LLVM's modular linking works, see the way that >>>> Gandiva requests specific components from LLVM to use for static >>>> linking https://github.com/apache/arrow/blob/master/cpp/cmake_modules/FindLLVM.cmake#L53 >>>> >>>> * Net simpler linking and deployment. Only one shared library to deal with >>>> >>>> There are some drawbacks, however: >>>> >>>> * Our C++ Linux packaging approach would need to be changed to be more >>>> LLVM-like (a single .deb/.yum package containing the C++ platform >>>> rather than many packages as now) >>>> >>>> Interested to hear from other C++ developers. >>>> >>>> Thanks >>>> Wes