arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes McKinney <wesmck...@gmail.com>
Subject Re: R arrow dependency on Rcpp?
Date Fri, 11 Aug 2017 20:03:25 GMT
hi Felix,

Thanks for this context.

From

> If the goal is to make such component a package released to CRAN however, then my take
is this becomes a release by itself and what is required for the package to function becomes
the area for discussion, as per my understanding.

My take is that releasing an Arrow source release or binary release to
CRAN, because of the extent of the R world's copyleft leanings, is
probably not the business of the Arrow PMC, as it is too fraught with
licensing concerns. It would be better to leave CRAN deployment to
downstream members of the R community. So the R community would be
free to release ArrowR to CRAN, but we the PMC would not vote on the
package artifacts.

This seems consistent with the way that software vendors are already
releasing Apache Spark and components of the Hadoop ecosystem in
downstream software distributions. I see this as preferable (compared
with avoiding all copyleft R software) as it enables a community of R
developers to thrive within the Apache Arrow project while avoiding
the redistribution questions.

The other side of this coin is that, IMHO, to eschew R's GPL ecosystem
(which describes, AFAICT, the vast majority of the R ecosystem, would
be nice to see some more detailed data) would amount to developing
R-Arrow integration with one hand tied behind our back. I would argue
that SparkR may be worse off because it is has (by appearances, at
least) isolated itself from the rest of the R ecosystem. Contrast
SparkR's import list [1] with sparklyr's [2], for example.

This is not an ideological viewpoint, merely a practical one. sparklyr
uses the Apache 2.0 license but depends on many GPLv2/3 packages at
runtime, and there is no conflict with this. The conflict is with the
ASF's position on releasing official software artifacts on behalf of
the project PMC, which we should absolutely respect.

I have no particular horse in this race as I do not do much R work
myself, except that I want to do what is best to grow the Arrow
community and enable the R world to benefit from our collective
efforts to the maximum extent possible.

Thanks,
Wes

[1]: https://github.com/apache/spark/blob/master/R/pkg/DESCRIPTION
[2:] https://github.com/rstudio/sparklyr/blob/master/DESCRIPTION

On Fri, Aug 11, 2017 at 11:57 AM, Felix Cheung
<felixcheung_m@hotmail.com> wrote:
> Thanks Wes. I think the discussion is in line with my understanding of the
> release of optional component as well.
>
> In Spark, which is often used as an example in various discussions, actually
> does not have GPL dependencies that are required to function at runtime. We
> have build and test dependencies (that's very hard to avoid) but they are
> not needed to install and run the package (besides R itself). We have some
> native C code in the source, but they are not required and not built (and
> not tested with, and to be honest, it has been more than 2 years, not likely
> to work at all).
>
> So going back to the context of Arrow, and optional component. If the goal
> is the have R source that are released with Arrow as source, and that an
> user will need to make a choice to manually extract the R pieces,
> build/install manually, my interpretation is that will be ok.
>
> If the goal is to make such component a package released to CRAN however,
> then my take is this becomes a release by itself and what is required for
> the package to function becomes the area for discussion, as per my
> understanding.
>
>
> ________________________________
> From: Wes McKinney <wesmckinn@gmail.com>
> Sent: Friday, August 11, 2017 7:29:29 AM
> To: dev@arrow.apache.org
> Cc: Felix Cheung
> Subject: Re: R arrow dependency on Rcpp?
>
> It seems that using Rcpp is fine because an R library for Arrow is an
> optional component of the project, but will await more opinions on
> LEGAL-324.
>
> + Felix Cheung -- I wonder if you could comment further on your
> concerns about licensing of R build dependencies, which were mentioned
> elsewhere.
>
> Thanks
>
> On Thu, Aug 10, 2017 at 2:12 PM, Wes McKinney <wesmckinn@gmail.com> wrote:
>> I started a discussion explaining the issue here:
>>
>> https://issues.apache.org/jira/browse/LEGAL-324
>>
>> On Thu, Aug 3, 2017 at 5:50 PM, Wes McKinney <wesmckinn@gmail.com> wrote:
>>> Thanks for weighing in on this, Hadley.
>>>
>>> To your point
>>>
>>>> You can distribute the package code according to its
>>>> license, but whenever you bundle it with R (i.e. to actually use it)
>>>> the GPL will apply to the whole conglomerate.
>>>
>>> If someone wanted to create an all-GPLv2 software distribution
>>> containing R and a bunch of libraries, then including the R Arrow
>>> library would be problematic as Apache 2.0 is not compatible
>>> (https://www.apache.org/licenses/GPL-compatibility.html). I don't
>>> think this is really a problem since R users generally just install
>>> things from CRAN.
>>>
>>> My understanding is that ASF legal has taken issue when an Apache
>>> project _cannot be used at all_ without a hard GPL dependency (outside
>>> certain exceptions, e.g. generated build files by GPL tools). This
>>> makes it impossible to create a self-contained software distribution
>>> of the project whose code and all dependencies are Apache 2.0
>>> compatible. There was the recent BSD+Patents discussion on LEGAL where
>>> projects were disallowed from using projects under that license as a
>>> hard dependency.
>>>
>>> I will open a LEGAL issue on the JIRA to discuss, but since the R
>>> portion of Arrow is an _optional_ part of the project, I am hopeful
>>> this will be deemed OK.
>>>
>>> - Wes
>>>
>>> On Thu, Aug 3, 2017 at 5:39 PM, Hadley Wickham <h.wickham@gmail.com>
>>> wrote:
>>>> On Thu, Aug 3, 2017 at 8:15 AM, Wes McKinney <wesmckinn@gmail.com>
>>>> wrote:
>>>>> I can open a ticket to get a definitive answer to these questions.
>>>>>
>>>>> From http://www.apache.org/legal/resolved.html#platform and the
>>>>> subsequent questions there, I view the R language and build tools like
>>>>> Rcpp as part of the "R platform", which is, for the most part, all
>>>>> GPL. SparkR depends on R, but only has testthat (MIT) as a dependency
>>>>> beyond the R runtime. I think it is challenging to build high quality
>>>>> software for the R platform relying only on the main R runtime and the
>>>>> limited third party components which happens to be released under
>>>>> non-CategoryX licenses.
>>>>
>>>> Some legal advice is probably needed, but do also see this statement
>>>> from the R Foundation about package licenses:
>>>> https://stat.ethz.ch/pipermail/r-devel/2009-May/053248.html
>>>>
>>>> In general, the R community has taken the opinion that it is ok to
>>>> license code that links to R with non-GPL (but GPL-compatible)
>>>> licenses. You can distribute the package code according to its
>>>> license, but whenever you bundle it with R (i.e. to actually use it)
>>>> the GPL will apply to the whole conglomerate.
>>>>
>>>> So including an R arrow package would be fine according to the general
>>>> standards of the R community. The Apache legal counsel may of course
>>>> disagree.
>>>>
>>>> Hadley
>>>>
>>>> --
>>>> http://hadley.nz

Mime
View raw message