commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilles <>
Subject Re: [Math] What's in a release
Date Mon, 29 Dec 2014 10:36:34 GMT
On Sun, 28 Dec 2014 20:21:32 -0700, Phil Steitz wrote:
> On 12/28/14 11:46 AM, Gilles wrote:
>> Hi.
>> On Sun, 28 Dec 2014 09:43:34 +0100, Luc Maisonobe wrote:
>>> Le 28/12/2014 00:22, sebb a écrit :
>>>> On 27 December 2014 at 22:19, Gilles
>>>> <> wrote:
>>>>> On Sat, 27 Dec 2014 17:48:05 +0000, sebb wrote:
>>>>>> On 24 December 2014 at 15:11, Gilles
>>>>>> <> wrote:
>>>>>>> On Wed, 24 Dec 2014 15:52:12 +0100, Luc Maisonobe wrote:
>>>>>>>> Le 24/12/2014 15:04, Gilles a écrit :
>>>>>>>>> On Wed, 24 Dec 2014 09:31:46 +0100, Luc Maisonobe wrote:
>>>>>>>>>> Le 24/12/2014 03:36, Gilles a écrit :
>>>>>>>>>>> On Tue, 23 Dec 2014 14:02:40 +0100, luc wrote:
>>>>>>>>>>>> This is a [VOTE] for releasing Apache Commons
Math 3.4
>>>>>>>>>>>> from release
>>>>>>>>>>>> candidate 3.
>>>>>>>>>>>> Tag name:
>>>>>>>>>>>>   MATH_3_4_RC3 (signature can be checked
from git using
>>>>>>>>>>>> 'git tag
>>>>>>>>>>>> -v')
>>>>>>>>>>>> Tag URL:
>>>>>>>>>>>> <;a=commit;h=befd8ebd96b8ef5a06b59dccb22bd55064e31c34>
>>>>>>>>>>> Is there a way to check that the source code
referred to
>>>>>>>>>>> above
>>>>>>>>>>> was the one used to create the JAR of the ".class"
>>>>>>>>>>> [Out of curiosity, not suspicion, of course...]
>>>>>>>>>> Yes, you can look at the end of the META-INF/MANIFEST.MS
>>>>>>>>>> file embedded
>>>>>>>>>> in the jar. The second-to-last entry is called
>>>>>>>>>> Implementation-Build.
>>>>>>>>>> It
>>>>>>>>>> is automatically created by maven-jgit-buildnumber-plugin
>>>>>>>>>> and contains
>>>>>>>>>> the SHA1 identifier of the last commit used for the
>>>>>>>>>> Here, is is
>>>>>>>>>> befd8ebd96b8ef5a06b59dccb22bd55064e31c34, so we can
>>>>>>>>>> it really
>>>>>>>>>> corresponds to the expected status of the git repository.
>>>>>>>>> Can this be considered "secure", i.e. can't this entry
>>>>>>>>> the MANIFEST
>>>>>>>>> file be modified to be the checksum of the repository
>>>>>>>>> with the
>>>>>>>>> .class
>>>>>>>>> files being substitued with those coming from another
>>>>>>>>> compilation?
>>>>>>>> Modifying anything in the jar (either this entry within the
>>>>>>>> manifest or
>>>>>>>> any class) will modify the jar signature. So as long as
>>>>>>>> people do check
>>>>>>>> the global MD5, SHA1 or gpg signature we provide with our
>>>>>>>> build, they
>>>>>>>> are safe to assume the artifacts are Apache artifacts.
>>>>>>>> This is not different from how releases are done with
>>>>>>>> subversion as the
>>>>>>>> source code control system, or even in C or C++ as the
>>>>>>>> language. At one
>>>>>>>> time, the release manager does perform a compilation and
>>>>>>>> fellow
>>>>>>>> reviewers check the result. There is no fullproof process
>>>>>>>> here, as
>>>>>>>> always when security is involved. Even using an automated
>>>>>>>> build and
>>>>>>>> automatic signing on an Apache server would involve trust
>>>>>>>> (i.e. one
>>>>>>>> should assume that the server has not been tampered with,
>>>>>>>> that the build
>>>>>>>> process really does what it is expected to do, that the
>>>>>>>> artifacts put to
>>>>>>>> review are really the one created by the automatic process
>>>>>>>> ...).
>>>>>>>> Another point is that what we officially release is the
>>>>>>>> source, which
>>>>>>>> can be reviewed by external users. The binary parts are
>>>>>>>> merely a
>>>>>>>> convenience.
>>>>>>> That's an interesting point to come back to since it looks
>>>>>>> like the
>>>>>>> most time-consuming part of a release is not related to the
>>>>>>> sources!
>>>>>>> Isn't it conceivable that a release could just be a commit
>>>>>>> identifier
>>>>>>> and a checksum of the repository?
>>>>>>> If the binaries are a just a convenience, why put so much
>>>>>>> effort in it?
>>>>>>> As a convenience, the artefacts could be produced after the
>>>>>>> release,
>>>>>>> accompanied with all the "caveat" notes which you mentioned.
>>>>>>> That would certainly increase the release rate.
>>>>>> Binary releases still need to be reviewed to ensure that the
>>>>>> correct N
>>>>>> & L files are present, and that the archives don't contain
>>>>>> material
>>>>>> with disallowed licenses.
>>>>>> It's not unknown for automated build processes to include
>>>>>> files that
>>>>>> should not be present.
>>>>> I fail to see the difference of principle between the "release"
>>>>> context
>>>>> and, say, the daily snapshot context.
>>>> Snapshots are not (should not) be promoted to the general public 
>>>> as
>>>> releases of the ASF.
>>>>> What I mean is that there seem to be a contradiction between
>>>>> saying that
>>>>> a "release" is only about _source_ and the obligation to check
>>>>> _binaries_.
>>>> There is no contradiction here.
>>>> The ASF releases source, they are required in a release.
>>>> Binaries are optional.
>>>> That does not mean that the ASF mirror system can be used to
>>>> distribute arbitrary binaries.
>>>>> It can occur that disallowed material is, at some point in
>>>>> time, part of
>>>>> the repository and/or the snapshot binaries.
>>>>> However, what is forbidden is... forbidden, at all times.
>>>> As with most things, this is not a strict dichotomy.
>>>>> If it is indeed a problem to distribute forbidden material,
>>>>> shouldn't
>>>>> this be corrected in the repository? [That's indeed what you
>>>>> did with
>>>>> the blocking of the release.]
>>>> If the repo is discovered to contain disallowed material, it
>>>> needs to
>>>> be removed.
>>>>> Then again, once the repository is "clean", it can be tagged
>>>>> and that
>>>>> tagged _source_ is the release.
>>>> Not quite.
>>>> A release is a source archive that is voted on and distributed
>>>> via the
>>>> ASF mirror system.
>>>> The contents must agree with the source tag, but the source tag
>>>> is not
>>>> the release.
>>>>> Non-compliant binaries would thus only be the result of a
>>>>> "mistake"
>>>>> (if the build system is flawed, it's another problem, unrelated 
>>>>> to
>>>>> the released contents, which is _source_) to be corrected per se.
>>>> Not so. There are other failure modes.
>>>> An automated build obviously reduces the chances of mistakes,
>>>> but it
>>>> can still create an archive containing files that should not be
>>>> there.
>>>> [Or indeed, omits files that should be present]
>>>> For example, the workspace contains spurious files which are
>>>> implicitly included by the assembly instructions.
>>>> Or the build process creates spurious files that are incorrectly
>>>> added
>>>> to the archive.
>>>> Or the build incorrectly includes jars that are supposed to be
>>>> provided by the end user
>>>> etc.
>>>> I have seen all the above in RC votes.
>>>> There are probably other falure modes.
>>>>> My proposition is that it's an independent step: once the build
>>>>> system is adjusted to the expectations, "correct" binaries can be
>>>>> generated from the same tagged release.
>>>> It does not matter when the binary is built.
>>>> If it is distributed by the PMC as a formal release, it must not
>>>> contain any surprises, e.g. it must be licensed under the AL.
>>>> It is therefore vital that the contents are as expected from the
>>>> build.
>>>> Note also that a formal release becomes an act of the PMC by the
>>>> voting process.
>>>> The ASF can then assume responsibility for any legal issues that
>>>> may arise.
>>>> Otherwise it is entirely the personal responsibility of the
>>>> person who
>>>> releases it.
>>> I think the last two points are really important: binaries must be
>>> checked and the foundation provides a legal protection for the
>>> project
>>> if something weird occurs.
>>> I also think another point is important: many if not most users do
>>> really expect binaries and not source. From our internal Apache
>>> point
>>> of view, these are a by-product,. For many others it is the
>>> important
>>> thing. It is mostly true in maven land as dependencies are
>>> automatically retrieved in binary form, not source form. So the
>>> maven
>>> central repository as a distribution system is important.
>>> Even if for some security reason it sounds at first thought
>>> logical to
>>> rely on source only and compile oneself, in an industrial context
>>> project teams do not have enough time to do it for all their
>>> dependencies, so they use binaries provided by trusted third
>>> parties. A
>>> long time ago, I compiled a lot of free software tools for the
>>> department I worked for at that time. I do not do this anymore, and
>>> trust the binaries provided by the packaging team for a 
>>> distribution
>>> (typically Debian). They do rely on source and compile
>>> themselves. Hey,
>>> I even think Emmanuel here belongs to the Debian java team ;-) I
>>> guess
>>> such teams that do rely on source are rather the exception than the
>>> rule. The other examples I can think of are packaging teams,
>>> development teams that need bleeding edge (and will also directly
>>> depend on the repository, not even the release), projects that
>>> need to
>>> introduce their own patches and people who have critical needs (for
>>> example when safety of people is concerned or when they need full
>>> control for legal or contractual reasons). Many other people
>>> download
>>> binaries directly and would simply not consider using a project
>>> if it
>>> is not readily available: they don't have time for this and don't
>>> want
>>> to learn how to build tens or hundred of different projects they
>>> simply
>>> use.
>> I do not disagree with anything said on this thread. [In
>> particular, I
>> did not at all imply that any one committer could take 
>> responsibility
>> for releasing unchecked items.]
>> I'm simply suggesting that what is called the release
>> process/management
>> could be made simpler (and _consequently_ could lead to more
>> regularly
>> releasing the CM code), by separating the concerns.
>> The concerns are
>>  1. "code" (the contents), and
>>  2. "artefacts" (the result of the build system acting on the
>> "code").
>> Checking of one of these is largely independent from checking the
>> other.
> Unfortunately, not really.  One principle that we have (maybe not
> crystal clear in the release doco) is that when we do distribute
> binaries, they should really be "convenience binaries" which means
> that everything needed to create them is in the source or its
> documented dependencies.  What that means is that what we tag as the
> source release needs to be able to generate any binaries that we
> subsequently release.  The only way to really test that is to
> generate the binaries and inspect them as part of verifying the 
> release.

Only way?  That's certainly not obvious to me: Since a tag/branch
uniquely identifies a set of files, that is, the "source release [that
is] able to generate any binaries that we subsequently release", if a
RM can do it at (source) release time, he (or someone else!) can do it
later, too (by running the build from a clone of the repository in its
tagged state).

> As others have pointed out, anything we release has to be verified
> and voted on.  As RM and reviewer, I think it is actually easier to
> roll and verify source and binaries together.

It's precisely my main point.
I won't dispute that you can prefer doing both (and nobody would forbid
a RM to do just that) but the point is about the possibility to release
source-only code (as the first step of a two-step procedure which I
described earlier).
[IMHO, the two-step one seems easier (both for the RM and the 
(mileage does vary).]

In short is it forbidden (by the official/legal rules of ASF) to 
as I propose?
It is impossible technically?


> Phil
>> [The more so that, as you said, no fool-proof link between the two
>> can
>> be ensured: From a security POV, checking the former requires a code
>> review, while using the latter requires trust in the build system.]
>> Thus we could release the "code", after checking and voting on the
>> concerned elements (i.e. the repository state corresponding to a
>> specific tag + the web site).
>> Then we could release the "binaries", as a convenience, after
>> checking
>> and voting on the concerned elements (i.e. the files about to be
>> distributed).
>> I think that it's an added flexibility that would, for example, 
>> allow
>> the tagging of the repository without necessarily release binaries
>> (i.e.
>> not involving that part of the work); and to release binaries
>> (say, at
>> regular intervals) based on the latest tagged code (i.e. not
>> involving
>> the work about solving/evaluating/postponing issues).
>> [I completely admit that, at first, it might look a little more
>> confusing for the plain user, but (IIUC) it would be a better
>> representation of the reality covered by stating that the ASF
>> releases source code.]
>> Best regards,
>> Gilles

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message