commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilles <gil...@harfang.homelinux.org>
Subject Re: [Math] What's in a release
Date Sun, 28 Dec 2014 18:46:54 GMT
Hi.

On Sun, 28 Dec 2014 09:43:34 +0100, Luc Maisonobe wrote:
> Le 28/12/2014 00:22, sebb a écrit :
>> On 27 December 2014 at 22:19, Gilles <gilles@harfang.homelinux.org> 
>> wrote:
>>> On Sat, 27 Dec 2014 17:48:05 +0000, sebb wrote:
>>>>
>>>> On 24 December 2014 at 15:11, Gilles 
>>>> <gilles@harfang.homelinux.org> wrote:
>>>>>
>>>>> On Wed, 24 Dec 2014 15:52:12 +0100, Luc Maisonobe wrote:
>>>>>>
>>>>>>
>>>>>> Le 24/12/2014 15:04, Gilles a écrit :
>>>>>>>
>>>>>>>
>>>>>>> On Wed, 24 Dec 2014 09:31:46 +0100, Luc Maisonobe wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Le 24/12/2014 03:36, Gilles a écrit :
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, 23 Dec 2014 14:02:40 +0100, luc wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This is a [VOTE] for releasing Apache Commons Math
3.4 from 
>>>>>>>>>> release
>>>>>>>>>> candidate 3.
>>>>>>>>>>
>>>>>>>>>> Tag name:
>>>>>>>>>>   MATH_3_4_RC3 (signature can be checked from git
using 'git 
>>>>>>>>>> tag
>>>>>>>>>> -v')
>>>>>>>>>>
>>>>>>>>>> Tag URL:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 
>>>>>>>>>> <https://git-wip-us.apache.org/repos/asf?p=commons-math.git;a=commit;h=befd8ebd96b8ef5a06b59dccb22bd55064e31c34>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Is there a way to check that the source code referred
to 
>>>>>>>>> above
>>>>>>>>> was the one used to create the JAR of the ".class" files.
>>>>>>>>> [Out of curiosity, not suspicion, of course...]
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Yes, you can look at the end of the META-INF/MANIFEST.MS
file 
>>>>>>>> embedded
>>>>>>>> in the jar. The second-to-last entry is called 
>>>>>>>> Implementation-Build.
>>>>>>>> It
>>>>>>>> is automatically created by maven-jgit-buildnumber-plugin
and 
>>>>>>>> contains
>>>>>>>> the SHA1 identifier of the last commit used for the build.

>>>>>>>> Here, is is
>>>>>>>> befd8ebd96b8ef5a06b59dccb22bd55064e31c34, so we can check
it 
>>>>>>>> really
>>>>>>>> corresponds to the expected status of the git repository.
>>>>>>>>
>>>>>>>
>>>>>>> Can this be considered "secure", i.e. can't this entry in the

>>>>>>> MANIFEST
>>>>>>> file be modified to be the checksum of the repository but with

>>>>>>> the
>>>>>>> .class
>>>>>>> files being substitued with those coming from another 
>>>>>>> compilation?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Modifying anything in the jar (either this entry within the 
>>>>>> manifest or
>>>>>> any class) will modify the jar signature. So as long as people 
>>>>>> do check
>>>>>> the global MD5, SHA1 or gpg signature we provide with our build,

>>>>>> they
>>>>>> are safe to assume the artifacts are Apache artifacts.
>>>>>>
>>>>>> This is not different from how releases are done with subversion

>>>>>> as the
>>>>>> source code control system, or even in C or C++ as the language.

>>>>>> At one
>>>>>> time, the release manager does perform a compilation and the 
>>>>>> fellow
>>>>>> reviewers check the result. There is no fullproof process here, 
>>>>>> as
>>>>>> always when security is involved. Even using an automated build 
>>>>>> and
>>>>>> automatic signing on an Apache server would involve trust (i.e. 
>>>>>> one
>>>>>> should assume that the server has not been tampered with, that 
>>>>>> the build
>>>>>> process really does what it is expected to do, that the 
>>>>>> artifacts put to
>>>>>> review are really the one created by the automatic process ...).
>>>>>>
>>>>>> Another point is that what we officially release is the source, 
>>>>>> which
>>>>>> can be reviewed by external users. The binary parts are merely a
>>>>>> convenience.
>>>>>
>>>>>
>>>>>
>>>>> That's an interesting point to come back to since it looks like 
>>>>> the
>>>>> most time-consuming part of a release is not related to the 
>>>>> sources!
>>>>>
>>>>> Isn't it conceivable that a release could just be a commit 
>>>>> identifier
>>>>> and a checksum of the repository?
>>>>>
>>>>> If the binaries are a just a convenience, why put so much effort 
>>>>> in it?
>>>>> As a convenience, the artefacts could be produced after the 
>>>>> release,
>>>>> accompanied with all the "caveat" notes which you mentioned.
>>>>>
>>>>> That would certainly increase the release rate.
>>>>
>>>>
>>>> Binary releases still need to be reviewed to ensure that the 
>>>> correct N
>>>> & L files are present, and that the archives don't contain 
>>>> material
>>>> with disallowed licenses.
>>>>
>>>> It's not unknown for automated build processes to include files 
>>>> that
>>>> should not be present.
>>>>
>>>
>>> I fail to see the difference of principle between the "release" 
>>> context
>>> and, say, the daily snapshot context.
>>
>> Snapshots are not (should not) be promoted to the general public as
>> releases of the ASF.
>>
>>> What I mean is that there seem to be a contradiction between saying 
>>> that
>>> a "release" is only about _source_ and the obligation to check 
>>> _binaries_.
>>
>> There is no contradiction here.
>> The ASF releases source, they are required in a release.
>> Binaries are optional.
>> That does not mean that the ASF mirror system can be used to
>> distribute arbitrary binaries.
>>
>>> It can occur that disallowed material is, at some point in time, 
>>> part of
>>> the repository and/or the snapshot binaries.
>>> However, what is forbidden is... forbidden, at all times.
>>
>> As with most things, this is not a strict dichotomy.
>>
>>> If it is indeed a problem to distribute forbidden material, 
>>> shouldn't
>>> this be corrected in the repository? [That's indeed what you did 
>>> with
>>> the blocking of the release.]
>>
>> If the repo is discovered to contain disallowed material, it needs 
>> to
>> be removed.
>>
>>> Then again, once the repository is "clean", it can be tagged and 
>>> that
>>> tagged _source_ is the release.
>>
>> Not quite.
>>
>> A release is a source archive that is voted on and distributed via 
>> the
>> ASF mirror system.
>> The contents must agree with the source tag, but the source tag is 
>> not
>> the release.
>>
>>> Non-compliant binaries would thus only be the result of a "mistake"
>>> (if the build system is flawed, it's another problem, unrelated to
>>> the released contents, which is _source_) to be corrected per se.
>>
>> Not so. There are other failure modes.
>>
>> An automated build obviously reduces the chances of mistakes, but it
>> can still create an archive containing files that should not be 
>> there.
>> [Or indeed, omits files that should be present]
>> For example, the workspace contains spurious files which are
>> implicitly included by the assembly instructions.
>> Or the build process creates spurious files that are incorrectly 
>> added
>> to the archive.
>> Or the build incorrectly includes jars that are supposed to be
>> provided by the end user
>> etc.
>>
>> I have seen all the above in RC votes.
>> There are probably other falure modes.
>>
>>> My proposition is that it's an independent step: once the build
>>> system is adjusted to the expectations, "correct" binaries can be
>>> generated from the same tagged release.
>>
>> It does not matter when the binary is built.
>> If it is distributed by the PMC as a formal release, it must not
>> contain any surprises, e.g. it must be licensed under the AL.
>>
>> It is therefore vital that the contents are as expected from the 
>> build.
>>
>> Note also that a formal release becomes an act of the PMC by the 
>> voting process.
>> The ASF can then assume responsibility for any legal issues that may 
>> arise.
>> Otherwise it is entirely the personal responsibility of the person 
>> who
>> releases it.
>
> I think the last two points are really important: binaries must be
> checked and the foundation provides a legal protection for the 
> project
> if something weird occurs.
>
> I also think another point is important: many if not most users do
> really expect binaries and not source. From our internal Apache point
> of view, these are a by-product,. For many others it is the important
> thing. It is mostly true in maven land as dependencies are
> automatically retrieved in binary form, not source form. So the maven
> central repository as a distribution system is important.
>
> Even if for some security reason it sounds at first thought logical 
> to
> rely on source only and compile oneself, in an industrial context
> project teams do not have enough time to do it for all their
> dependencies, so they use binaries provided by trusted third parties. 
> A
> long time ago, I compiled a lot of free software tools for the
> department I worked for at that time. I do not do this anymore, and
> trust the binaries provided by the packaging team for a distribution
> (typically Debian). They do rely on source and compile themselves. 
> Hey,
> I even think Emmanuel here belongs to the Debian java team ;-) I 
> guess
> such teams that do rely on source are rather the exception than the
> rule. The other examples I can think of are packaging teams,
> development teams that need bleeding edge (and will also directly
> depend on the repository, not even the release), projects that need 
> to
> introduce their own patches and people who have critical needs (for
> example when safety of people is concerned or when they need full
> control for legal or contractual reasons). Many other people download
> binaries directly and would simply not consider using a project if it
> is not readily available: they don't have time for this and don't 
> want
> to learn how to build tens or hundred of different projects they 
> simply
> use.
>

I do not disagree with anything said on this thread. [In particular, I
did not at all imply that any one committer could take responsibility
for releasing unchecked items.]

I'm simply suggesting that what is called the release 
process/management
could be made simpler (and _consequently_ could lead to more regularly
releasing the CM code), by separating the concerns.
The concerns are
  1. "code" (the contents), and
  2. "artefacts" (the result of the build system acting on the "code").

Checking of one of these is largely independent from checking the 
other.
[The more so that, as you said, no fool-proof link between the two can
be ensured: From a security POV, checking the former requires a code
review, while using the latter requires trust in the build system.]

Thus we could release the "code", after checking and voting on the
concerned elements (i.e. the repository state corresponding to a
specific tag + the web site).

Then we could release the "binaries", as a convenience, after checking
and voting on the concerned elements (i.e. the files about to be
distributed).

I think that it's an added flexibility that would, for example, allow
the tagging of the repository without necessarily release binaries 
(i.e.
not involving that part of the work); and to release binaries (say, at
regular intervals) based on the latest tagged code (i.e. not involving
the work about solving/evaluating/postponing issues).

[I completely admit that, at first, it might look a little more
confusing for the plain user, but (IIUC) it would be a better
representation of the reality covered by stating that the ASF
releases source code.]


Best regards,
Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message