commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilles <>
Subject Re: [Math] What's in a release
Date Tue, 30 Dec 2014 01:15:40 GMT
On Mon, 29 Dec 2014 16:21:05 +0100, Thomas Neidhart wrote:
> On 12/29/2014 04:21 AM, Phil Steitz wrote:
>> On 12/28/14 11:46 AM, Gilles wrote:
>>> Hi.
>>> On Sun, 28 Dec 2014 09:43:34 +0100, Luc Maisonobe wrote:
>>>> Le 28/12/2014 00:22, sebb a écrit :
>>>>> On 27 December 2014 at 22:19, Gilles
>>>>> <> wrote:
>>>>>> On Sat, 27 Dec 2014 17:48:05 +0000, sebb wrote:
>>>>>>> On 24 December 2014 at 15:11, Gilles
>>>>>>> <> wrote:
>>>>>>>> On Wed, 24 Dec 2014 15:52:12 +0100, Luc Maisonobe wrote:
>>>>>>>>> Le 24/12/2014 15:04, Gilles a écrit :
>>>>>>>>>> On Wed, 24 Dec 2014 09:31:46 +0100, Luc Maisonobe
>>>>>>>>>>> Le 24/12/2014 03:36, Gilles a écrit :
>>>>>>>>>>>> On Tue, 23 Dec 2014 14:02:40 +0100, luc wrote:
>>>>>>>>>>>>> This is a [VOTE] for releasing Apache
Commons Math 3.4
>>>>>>>>>>>>> from release
>>>>>>>>>>>>> candidate 3.
>>>>>>>>>>>>> Tag name:
>>>>>>>>>>>>>   MATH_3_4_RC3 (signature can be checked
from git using
>>>>>>>>>>>>> 'git tag
>>>>>>>>>>>>> -v')
>>>>>>>>>>>>> Tag URL:
>>>>>>>>>>>>> <;a=commit;h=befd8ebd96b8ef5a06b59dccb22bd55064e31c34>
>>>>>>>>>>>> Is there a way to check that the source code
referred to
>>>>>>>>>>>> above
>>>>>>>>>>>> was the one used to create the JAR of the
".class" files.
>>>>>>>>>>>> [Out of curiosity, not suspicion, of course...]
>>>>>>>>>>> Yes, you can look at the end of the META-INF/MANIFEST.MS
>>>>>>>>>>> file embedded
>>>>>>>>>>> in the jar. The second-to-last entry is called
>>>>>>>>>>> Implementation-Build.
>>>>>>>>>>> It
>>>>>>>>>>> is automatically created by maven-jgit-buildnumber-plugin
>>>>>>>>>>> and contains
>>>>>>>>>>> the SHA1 identifier of the last commit used for
the build.
>>>>>>>>>>> Here, is is
>>>>>>>>>>> befd8ebd96b8ef5a06b59dccb22bd55064e31c34, so
we can check
>>>>>>>>>>> it really
>>>>>>>>>>> corresponds to the expected status of the git
>>>>>>>>>> Can this be considered "secure", i.e. can't this
entry in
>>>>>>>>>> the MANIFEST
>>>>>>>>>> file be modified to be the checksum of the repository
>>>>>>>>>> with the
>>>>>>>>>> .class
>>>>>>>>>> files being substitued with those coming from another
>>>>>>>>>> compilation?
>>>>>>>>> Modifying anything in the jar (either this entry within
>>>>>>>>> manifest or
>>>>>>>>> any class) will modify the jar signature. So as long
>>>>>>>>> people do check
>>>>>>>>> the global MD5, SHA1 or gpg signature we provide with
>>>>>>>>> build, they
>>>>>>>>> are safe to assume the artifacts are Apache artifacts.
>>>>>>>>> This is not different from how releases are done with
>>>>>>>>> subversion as the
>>>>>>>>> source code control system, or even in C or C++ as the
>>>>>>>>> language. At one
>>>>>>>>> time, the release manager does perform a compilation
and the
>>>>>>>>> fellow
>>>>>>>>> reviewers check the result. There is no fullproof process
>>>>>>>>> here, as
>>>>>>>>> always when security is involved. Even using an automated
>>>>>>>>> build and
>>>>>>>>> automatic signing on an Apache server would involve trust
>>>>>>>>> (i.e. one
>>>>>>>>> should assume that the server has not been tampered with,
>>>>>>>>> that the build
>>>>>>>>> process really does what it is expected to do, that the
>>>>>>>>> artifacts put to
>>>>>>>>> review are really the one created by the automatic process
>>>>>>>>> ...).
>>>>>>>>> Another point is that what we officially release is the
>>>>>>>>> source, which
>>>>>>>>> can be reviewed by external users. The binary parts are
>>>>>>>>> merely a
>>>>>>>>> convenience.
>>>>>>>> That's an interesting point to come back to since it looks
>>>>>>>> like the
>>>>>>>> most time-consuming part of a release is not related to the
>>>>>>>> sources!
>>>>>>>> Isn't it conceivable that a release could just be a commit
>>>>>>>> identifier
>>>>>>>> and a checksum of the repository?
>>>>>>>> If the binaries are a just a convenience, why put so much
>>>>>>>> effort in it?
>>>>>>>> As a convenience, the artefacts could be produced after the
>>>>>>>> release,
>>>>>>>> accompanied with all the "caveat" notes which you mentioned.
>>>>>>>> That would certainly increase the release rate.
>>>>>>> Binary releases still need to be reviewed to ensure that the
>>>>>>> correct N
>>>>>>> & L files are present, and that the archives don't contain
>>>>>>> material
>>>>>>> with disallowed licenses.
>>>>>>> It's not unknown for automated build processes to include
>>>>>>> files that
>>>>>>> should not be present.
>>>>>> I fail to see the difference of principle between the "release"
>>>>>> context
>>>>>> and, say, the daily snapshot context.
>>>>> Snapshots are not (should not) be promoted to the general public 
>>>>> as
>>>>> releases of the ASF.
>>>>>> What I mean is that there seem to be a contradiction between
>>>>>> saying that
>>>>>> a "release" is only about _source_ and the obligation to check
>>>>>> _binaries_.
>>>>> There is no contradiction here.
>>>>> The ASF releases source, they are required in a release.
>>>>> Binaries are optional.
>>>>> That does not mean that the ASF mirror system can be used to
>>>>> distribute arbitrary binaries.
>>>>>> It can occur that disallowed material is, at some point in
>>>>>> time, part of
>>>>>> the repository and/or the snapshot binaries.
>>>>>> However, what is forbidden is... forbidden, at all times.
>>>>> As with most things, this is not a strict dichotomy.
>>>>>> If it is indeed a problem to distribute forbidden material,
>>>>>> shouldn't
>>>>>> this be corrected in the repository? [That's indeed what you
>>>>>> did with
>>>>>> the blocking of the release.]
>>>>> If the repo is discovered to contain disallowed material, it
>>>>> needs to
>>>>> be removed.
>>>>>> Then again, once the repository is "clean", it can be tagged
>>>>>> and that
>>>>>> tagged _source_ is the release.
>>>>> Not quite.
>>>>> A release is a source archive that is voted on and distributed
>>>>> via the
>>>>> ASF mirror system.
>>>>> The contents must agree with the source tag, but the source tag
>>>>> is not
>>>>> the release.
>>>>>> Non-compliant binaries would thus only be the result of a
>>>>>> "mistake"
>>>>>> (if the build system is flawed, it's another problem, unrelated 
>>>>>> to
>>>>>> the released contents, which is _source_) to be corrected per 
>>>>>> se.
>>>>> Not so. There are other failure modes.
>>>>> An automated build obviously reduces the chances of mistakes,
>>>>> but it
>>>>> can still create an archive containing files that should not be
>>>>> there.
>>>>> [Or indeed, omits files that should be present]
>>>>> For example, the workspace contains spurious files which are
>>>>> implicitly included by the assembly instructions.
>>>>> Or the build process creates spurious files that are incorrectly
>>>>> added
>>>>> to the archive.
>>>>> Or the build incorrectly includes jars that are supposed to be
>>>>> provided by the end user
>>>>> etc.
>>>>> I have seen all the above in RC votes.
>>>>> There are probably other falure modes.
>>>>>> My proposition is that it's an independent step: once the build
>>>>>> system is adjusted to the expectations, "correct" binaries can 
>>>>>> be
>>>>>> generated from the same tagged release.
>>>>> It does not matter when the binary is built.
>>>>> If it is distributed by the PMC as a formal release, it must not
>>>>> contain any surprises, e.g. it must be licensed under the AL.
>>>>> It is therefore vital that the contents are as expected from the
>>>>> build.
>>>>> Note also that a formal release becomes an act of the PMC by the
>>>>> voting process.
>>>>> The ASF can then assume responsibility for any legal issues that
>>>>> may arise.
>>>>> Otherwise it is entirely the personal responsibility of the
>>>>> person who
>>>>> releases it.
>>>> I think the last two points are really important: binaries must be
>>>> checked and the foundation provides a legal protection for the
>>>> project
>>>> if something weird occurs.
>>>> I also think another point is important: many if not most users do
>>>> really expect binaries and not source. From our internal Apache
>>>> point
>>>> of view, these are a by-product,. For many others it is the
>>>> important
>>>> thing. It is mostly true in maven land as dependencies are
>>>> automatically retrieved in binary form, not source form. So the
>>>> maven
>>>> central repository as a distribution system is important.
>>>> Even if for some security reason it sounds at first thought
>>>> logical to
>>>> rely on source only and compile oneself, in an industrial context
>>>> project teams do not have enough time to do it for all their
>>>> dependencies, so they use binaries provided by trusted third
>>>> parties. A
>>>> long time ago, I compiled a lot of free software tools for the
>>>> department I worked for at that time. I do not do this anymore, 
>>>> and
>>>> trust the binaries provided by the packaging team for a 
>>>> distribution
>>>> (typically Debian). They do rely on source and compile
>>>> themselves. Hey,
>>>> I even think Emmanuel here belongs to the Debian java team ;-) I
>>>> guess
>>>> such teams that do rely on source are rather the exception than 
>>>> the
>>>> rule. The other examples I can think of are packaging teams,
>>>> development teams that need bleeding edge (and will also directly
>>>> depend on the repository, not even the release), projects that
>>>> need to
>>>> introduce their own patches and people who have critical needs 
>>>> (for
>>>> example when safety of people is concerned or when they need full
>>>> control for legal or contractual reasons). Many other people
>>>> download
>>>> binaries directly and would simply not consider using a project
>>>> if it
>>>> is not readily available: they don't have time for this and don't
>>>> want
>>>> to learn how to build tens or hundred of different projects they
>>>> simply
>>>> use.
>>> I do not disagree with anything said on this thread. [In
>>> particular, I
>>> did not at all imply that any one committer could take 
>>> responsibility
>>> for releasing unchecked items.]
>>> I'm simply suggesting that what is called the release
>>> process/management
>>> could be made simpler (and _consequently_ could lead to more
>>> regularly
>>> releasing the CM code), by separating the concerns.
>>> The concerns are
>>>  1. "code" (the contents), and
>>>  2. "artefacts" (the result of the build system acting on the
>>> "code").
>>> Checking of one of these is largely independent from checking the
>>> other.
>> Unfortunately, not really.  One principle that we have (maybe not
>> crystal clear in the release doco) is that when we do distribute
>> binaries, they should really be "convenience binaries" which means
>> that everything needed to create them is in the source or its
>> documented dependencies.  What that means is that what we tag as the
>> source release needs to be able to generate any binaries that we
>> subsequently release.  The only way to really test that is to
>> generate the binaries and inspect them as part of verifying the 
>> release.
>> As others have pointed out, anything we release has to be verified
>> and voted on.  As RM and reviewer, I think it is actually easier to
>> roll and verify source and binaries together.
> Personally, I do not think that the RM tasks are that much work or
> cumbersome, once you have done it a few times.
> The bigger problem I see is related to the voting process, as there 
> are
> many people looking at a release from very different POVs and finding
> problems that a RM (or single developer of a component) may not be 
> aware
> of or able to test himself, thus delaying the release process a lot.

My proposal is an attempt to relieve a little that precise problem.

> A more automated way of creating and especially testing the 
> correctness
> of releases would help here.

Checking the signed tag, as advertized by Luc, is a step in that
direction. If we allow source releases, then it's done: a reviewer can
be sure that the code on his machine is the one provided by the RM.


> Thomas
>> Phil
>>> [The more so that, as you said, no fool-proof link between the two
>>> can
>>> be ensured: From a security POV, checking the former requires a 
>>> code
>>> review, while using the latter requires trust in the build system.]
>>> Thus we could release the "code", after checking and voting on the
>>> concerned elements (i.e. the repository state corresponding to a
>>> specific tag + the web site).
>>> Then we could release the "binaries", as a convenience, after
>>> checking
>>> and voting on the concerned elements (i.e. the files about to be
>>> distributed).
>>> I think that it's an added flexibility that would, for example, 
>>> allow
>>> the tagging of the repository without necessarily release binaries
>>> (i.e.
>>> not involving that part of the work); and to release binaries
>>> (say, at
>>> regular intervals) based on the latest tagged code (i.e. not
>>> involving
>>> the work about solving/evaluating/postponing issues).
>>> [I completely admit that, at first, it might look a little more
>>> confusing for the plain user, but (IIUC) it would be a better
>>> representation of the reality covered by stating that the ASF
>>> releases source code.]
>>> Best regards,
>>> Gilles
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message