arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Javier Luraschi <jav...@rstudio.com>
Subject Re: [R] Improving documentation and transparency for Arrow build and packaging work for R
Date Mon, 01 Apr 2019 18:51:34 GMT
Added entry for "Updating CRAN packages" here:

https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide#ReleaseManagementGuide-UpdatingCRANpackages
<https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide>

I'm sure we will have to update with more details and as
the process change, but it should be a good start if anyone
had to update this.

However, I also created a page with the same content that
now needs to be deleted (I don't have deleted permissions
and probably shouldn't):

https://cwiki.apache.org/confluence/display/ARROW/Updating+CRAN+packages.

If someone could please delete the "Updated CRAN
packages" page, that would be great, thank you!

On Thu, Mar 28, 2019 at 6:54 PM Wes McKinney <wesmckinn@gmail.com> wrote:

> thanks Javier, I just gave you edit permissions on the wiki
>
> On Mon, Mar 25, 2019 at 4:55 PM Javier Luraschi <javier@rstudio.com>
> wrote:
> >
> > I signed up as "Javier Luraschi" with this email, if you could please
> > give me access that would be great. Thanks!
> >
> > I'm assuming the CRAN documentation would go under:
> > https://cwiki.apache.org/confluence/display/ARROW/Distribution+Packages
> > I'll start adding it when I get access.
>
> I think this page is a bit different, it shows "where to find the
> packages". Might be a good idea to create an "R developer guide" or
> similar under
>
>
> https://cwiki.apache.org/confluence/display/ARROW#ApacheArrowHome-RLibraries
>
> >
> > Yes, I mean https://github.com/apache/arrow/pull/3932.
> >
> > Regarding "The challenge I see is that the development procedure is being
> > commingled with packaging issues.". Yes, I agree! Let me send a PR to fix
> > that
> > as well. If a developer properly sets up the RTools development
> environment,
> > they should not need to rely or rwinlibs.
> >
> > Regarding "How would you suggest testing release", this would be
> > addressed with the previous comment. As in, there needs to be support
> > from building the RTools binaries locally. I'll work on this and follow
> up
> > with the
> > PR/JIRA-issue once it's ready.
> >
> > Regarding "Seems like this should be turned into a Crossbow task", right;
> > however, I'm limited in time here. I'll open a Jira issue to get some
> help
> > from
> > the community. I see this as a nice-to-have and less of a must-have, but
> > I'll
> > certainly add this to the confluent docs.
> >
> > Regarding "If there a way to simulate this environment locally?", yes,
> this
> > is
> > called "R CMD check --as-cran" I'll add it to the confluent docs as well.
> >
> > Regarding, " let's definitely copy this information into a page on the
> > wiki", for
> > sure.
> >
> > Regarding, "Given how manual the process is right now it seems like
> > there's a solid chance that something will be broken after the 0.13", we
> > need more automation and have maintainers used to building RTool
> > binaries, etc. so year, probably the 0.13 will be rough but we will have
> > to go through this and get better over time, not sure we can automate
> > everything on a first release.
> >
> > Yes, I'll reply to "can you reply on the "Timeline for 0.13 release".
> >
> > I think pending docs and PR to decouple builds from release, this would
> > address most of these concerns, correct? Otherwise, let me know.
> >
> > Regarding, "can you reply on the Timeline for 0.13 release". Replied and
> > yes, I just marked the remaining JIRA issue as required for 1.13.
> >
>
> Yes, I think so. Given the diversity of the community, I think we
> should strive to create a humane, well-documented developer experience
> that does not rely on deep institutional knowledge (which can be hard
> to come by) to undertake basic workflows. That way it will be easier
> for folks less well-versed in nitty-gritty CRAN stuff to be able to at
> least build the project from source and test things out
>
> Thanks
> Wes
>
> > Best, Javier
> >
> >
> > On Mon, Mar 25, 2019 at 1:33 PM Wes McKinney <wesmckinn@gmail.com>
> wrote:
> >
> > > hi Javier,
> > >
> > > Thank you for writing back.
> > >
> > > On Mon, Mar 25, 2019 at 12:41 PM Javier Luraschi <javier@rstudio.com>
> > > wrote:
> > > >
> > > > Hi Wes, sorry for the delay I haven't been monitoring this DL
> > > proactively.
> > >
> > > Yes, I highly recommend setting up some e-mail filters so anything
> > > with "[R]" in the subject title lands in your inbox. You can also
> > > separate "[jira]" messages with a separate filter; there isn't very
> > > much list traffic if you split off the new issue notifications.
> > >
> > > >
> > > > Please notice that I'm not the expert in this topic, so I'll share as
> > > much
> > > > information
> > > > as I can but others with more expertise should feel free comment as
> well.
> > > > Please
> > > > also note that some of the restrictions we have are common practices
> in
> > > > R packages that are out of our control, at least without significant
> > > > investment.
> > > >
> > > > I'll document what I know in this email, but please let me know if
> there
> > > is
> > > > a wiki
> > > > or a better place to move this documentation into.
> > > >
> > >
> > > Yes, let's definitely stash all of the build and packaging information
> > > on our wiki at
> > >
> > > https://cwiki.apache.org/confluence/display/ARROW
> > >
> > > If you let me know your ASF Confluence username I will give you edit
> > > permissions
> > >
> > > > ## Background
> > > >
> > > > CRAN, The Comprehensive R Archive Network, is the most popular
> (primary)
> > > > package repo for the R community. You can think of CRAN as Homebrew
> or
> > > > pip.org. CRAN encourages cross-platform packages to be submitted
> and to
> > > > ease compilation and testing, provide support to precompile binaries
> for
> > > OS
> > > > X
> > > > and Windows. We will focus now on Windows specifics from now on.
> > > >
> > > > CRAN and R rely on a set of tools based on Mingw to easily compile
> > > packages
> > > > in Windows, this tools set is known as RTools. Originally, Prof.
> Brian
> > > > Ripley and
> > > > Duncan Murdoch put this toolset together; however, Jeroen Ooms is it
> > > current
> > > > maintainer. RTools is based on Mingw but from past experience, not
> > > > completely
> > > > interchangeable with the standard Mingw distribution. I'm afraid I
> don't
> > > > have the
> > > > details but this is mostly related to specific packages, versions and
> > > > compilers
> > > > included in Rtools. It's possible to match a Mingw environment with
> > > RTools
> > > > but
> > > > this is, in general, not a straightforward task.
> > >
> > > It would be good to have some links (on a wiki page) to any additional
> > > information about this.
> > >
> > > >
> > > > A few months ago, I naively tried to accomplish this work myself. As
> in,
> > > get
> > > > RTools to compile Apache Arrow, how hard can it be? It's hard to
> explain
> > > > all the caveats in a single mail, but if you are interested, you can
> read
> > > > my own exploration of possible solutions to this problem in this gist
> > > > writeup [1].
> > > >
> > > > The outcome of this investigation, at least for me and my limited
> > > knowledge
> > > > was
> > > > to not try to do this on my own by reinventing the wheel; otherwise,
> this
> > > > would
> > > > have taken months of my own time. The solution was then to find out
> how
> > > > other
> > > > R packages have solve this problem in the past.
> > > >
> > > > Given the specifics of the RTools toolchain, for complex projects
> with
> > > > significant
> > > > number of components and dependencies, the best (and maybe only!) way
> > > > to get R packages into CRAN in Windows is to precompile the binaries
> > > outside
> > > > of the CRAN build process. The repo of precompiled packages is called
> > > > rwinlibs [2] and has 75 packages and growing. When compiling in CRAN,
> > > rather
> > > > than building the library, it simply gets downloaded from the
> rwinlibs
> > > repo.
> > > >
> > > > How then are the rwinlibs libraries build then? All the packages are
> > > built
> > > > through
> > > > an automated build system available under theb rtools-packages [3]
> repo
> > > > where
> > > > an appveyor script detects changes and builds the appropriate
> libraries.
> > > > This repo
> > > > runs with the latest RTools toolchain. To support previous versions
> of
> > > > R/RTools a
> > > > the rtools-backports [4] repo provides backward compatibility in an
> > > > automated way.
> > > >
> > > > So now we can get back at discussing how we want to make this work
> in the
> > > > arrow project. One way, which this PR encourages is to say "Lets not
> > > worry
> > > > about
> > > > what the R/CRAN publishing process is, they have their own processes
> and
> > > > tools
> > > > to build binaries for Windows. This is similar to brew formulae, the
> > > > formula that
> > > > builds arrow for OS X using homebrew is in a different repo [5]".
> > >
> > > When you say "this PR" you mean
> > >
> > > https://github.com/apache/arrow/pull/4011
> > >
> > > or
> > >
> > > https://github.com/apache/arrow/pull/3932
> > >
> > > The challenge I see is that the development procedure is being
> > > commingled with packaging issues. I would like to see a write-up to
> > > provide instructions for an Arrow developer to create a build of Arrow
> > > on the master branch using mingw/Rtools for the purposes of
> > > development. If we don't have this written down, this is putting us in
> > > a potentially very bad situation where developers cannot debug issues.
> > > I think it's fine if all of the other C++ dependencies are snapshotted
> > > in rwinlibs
> > >
> > > >
> > > > While splitting the release processes into multiple repos has some
> > > > advantages,
> > > > it certainly has some caveats. For instance, when publishing a new
> > > release
> > > > of
> > > > arrow in Homebrew, one needs to manually go an update the Hombrew
> > > formulae.
> > > >
> > > > That said, I would hope that the Homebrew release process is
> documented
> > > in
> > > > the
> > > > Arrow project in the same way that we should document the R release
> > > process
> > > > in
> > > > the Arrow project. Hopefully this mail helps build a first iteration
> on
> > > > this.
> > > >
> > > > ## Releasing
> > > >
> > > > These instructions are a bit more pragmatic as to what needs to be
> done
> > > to
> > > > release
> > > > the R package in CRAN:
> > > >
> > > > (1) Send PR to the rtools-packages [3], increment the version, the
> repo
> > > > already
> > > >      downloads the binaries from the Arrow GitHub project. Ensure
> that
> > > the
> > > > appveyor
> > > >      build succeeds. If the build or tests fails, send the
> appropriate PR
> > > > to the official
> > > >      Arrow repo.
> > >
> > > How would you suggest testing release candidates or otherwise doing
> > > some form of continuous integration / integration testing to ensure we
> > > haven't broken this step?
> > >
> > > > (2) Send PR to the rtools-backports [4], similar to (1) but different
> > > repo.
> > >
> > > Seems like this should be turned into a Crossbow task in this project
> > > (see https://github.com/apache/arrow/tree/master/dev/tasks) so that it
> > > can be maintained by the Arrow community. This is how we are handling
> > > package automation for Linux packages, Python wheels, Gandiva JARs,
> > > etc. This also may enable the integration testing I described above to
> > > take place (though having an Appveyor build would be superior)
> > >
> > > > (3) Copy the output produced by (1) and (2) as a PR to the
> rwinlib/arrow
> > > > [6] repo.
> > > > (4) Before merging (3) validate that CRAN can build and test using
> the
> > > new
> > > > library
> > > >      using the winbuilder service [7]. This service is maintained to
> CRAN
> > > > and allows
> > > >      you to pre-check a package builds properly under a CRAN-like
> build
> > > > machine
> > > >      for Windows.
> > >
> > > If there a way to simulate this environment locally?
> > >
> > > > (5) Submit package to CRAN, make sure their practices and processes
> are
> > > >      followed [8].
> > > >
> > > > While I did my best to document the steps, there is certainly more
> > > details
> > > > that can be
> > > > added over time. Regardless, feel free to reach out to me with
> questions,
> > > > support
> > > > requests and why not and I'll try my best to address them.
> > > >
> > >
> > > OK, let's definitely copy this information into a page on the wiki so
> > > that these steps can be maintained as time goes on. The goal would be
> > > to have sufficient detail to increase the bus factor involved with
> > > post-release tasks.
> > >
> > > Given how manual the process is right now it seems like there's a
> > > solid chance that something will be broken after the 0.13 release is
> > > out. Speaking of which, can you reply on the "Timeline for 0.13
> > > release" thread about any PRs that need to get merged? Please set the
> > > "Fix Version" so they show up in the list of 0.13 issues
> > >
> > > Thanks,
> > > Wes
> > >
> > > > Best, Javier
> > > >
> > > > [1]:
> > >
> https://gist.github.com/javierluraschi/2ade2204364a7c20e9c3d95504d12ce5
> > > > [2]: https://github.com/rwinlib/
> > > > [3]: https://github.com/r-windows/rtools-packages
> > > > [4]: https://github.com/r-windows/rtools-backports
> > > > [5]:
> > > >
> > >
> https://github.com/Homebrew/homebrew-core/blob/master/Formula/apache-arrow.rb
> > > > [6]: https://github.com/rwinlib/arrow
> > > > [7]: https://win-builder.r-project.org/
> > > > [8]: https://cran.r-project.org/submit.html
> > > >
> > > >
> > > >
> > > >
> > > > On Sat, Mar 16, 2019 at 1:10 PM Wes McKinney <wesmckinn@gmail.com>
> > > wrote:
> > > >
> > > > > hi folks,
> > > > >
> > > > > I have noticed there is work under way to prepare Apache Arrow for
> > > > > submission to the CRAN package manager for R users. I'm slightly
> > > > > concerned about the lack of information and documentation in the
> > > > > project regarding what is involved with this effort. This patch in
> > > > > particular raised some eyebrows
> > > > >
> > > > > https://github.com/apache/arrow/pull/3932
> > > > >
> > > > > This introduces a dependency into the project on pre-built static
> > > > > libraries based on processes that aren't documented in the
> project. I
> > > > > see this repository containing these static libraries for the R
> > > > > Windows toolchain, but if I needed to produce them myself I would
> not
> > > > > know what to do
> > > > >
> > > > > https://github.com/rwinlib/arrow
> > > > >
> > > > > Additionally, in general, if I wanted to build and test Arrow and
R
> > > > > from source on Windows, I also would not know what to do.
> > > > >
> > > > > In the Python world, this would be akin to depending on e.g.
> > > > > conda-forge packages for Windows development, but not having any
> > > > > information in the repository about to build Arrow C++ and Python
> from
> > > > > source on Windows.
> > > > >
> > > > > So I would like to see some transparency / documentation around the
> > > > > scripts and processes involved with this so that we don't end up
> with
> > > > > a "bus factor" problem where Arrow PMC members are unable to
> undertake
> > > > > basic maintenance and release management activities. Currently the
> > > > > work that is going on seems opaque to me and as such feels
> contrary to
> > > > > the Apache Way.
> > > > >
> > > > > I understand that there is some urgency to make the Arrow libraries
> > > > > available to R users, but I want to make sure we are working in a
> > > > > sustainable manner to grow a community of developers who are able
> to
> > > > > do work on each part of the project.
> > > > >
> > > > > Thanks,
> > > > > Wes
> > > > >
> > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message