harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geir Magnusson Jr." <ge...@apache.org>
Subject Re: [legal] Proposed changes for the Bulk Contributor Questionnaire
Date Tue, 15 Nov 2005 22:23:08 GMT

On Nov 15, 2005, at 4:28 PM, Dalibor Topic wrote:

> Geir Magnusson Jr. wrote:
>> On Nov 14, 2005, at 3:18 PM, Dalibor Topic wrote:
>>> On Mon, Nov 14, 2005 at 09:57:48AM -0500, Stefano Mazzocchi wrote:
>>>> Leo Simons wrote:
>>>>> Rant below. Decided not to tone it down.
>>>> Leo++
>>> +1 from me, too. sounds like an excellent way to shoot oneself to
>>> slashdot with headlines like "Apache foundation rejects code from  
>>> IBM,
>>> claims it was stolen from FSF!". Political suicide, should it ever
>>> happen, as it'd force the ASF to play arbiter in disputes that don't
>>> exist.
>> I don't understand this.  I'm suggesting we use a tool internally to
>> help us *find* problems, both at contribution time as well as ongoing
>> to ensure that inappropriate 3rd party code doesn't come in  
>> during  the
>> regular flow of activity.  We'd then examine any issues raised,  and
>> make a judgement based on that.
> OK. I'm uncomfortable delegating such a potentially sensitive issue  
> to a
> proprietary black box, as in the worst case that leaves us with little
> chance to explore why the black box oracle came up with a wrong or  
> right
> analysis.

I'm confused as I don't understand how you are thinking of this.

First, we mention Blackduck as an example of tools that we might use  
in a specific case of contribution, suggesting that contributors do a  
similar thing before contributing if they choose.  There is no  

Second, there's no analysis from BD and it's ilk, no "thumbs up" or  
"thumbs down" - it's simply "these files seem to be like those files"  
and we humans than go look and judge.

We're not turning over any decision making to anyone.

> Checking code pedigree makes sense. It just needs to be transparent.

You get a list of files.  You can go check them.  Is how those  
matches were done significant?  Can you tell me the algorithm your  
head uses? :)

>> Suppose a contribution had code from the FSF. (IBMs doesn't.    
>> Period)
> Yeah, I didn't mean to imply it had, just as an ugly worst case
> scenario. I can come up with an even worse one, actually, in which a
> hypothetical IBM contribution had traces of Microsoft's VM code.
> Microsoft should be scarier than the FSF to most people on this  
> list, I
> guess, as the FSF has an interest in working together with us, whereas
> Microsoft's interests probably aren't aligned with open source J2SE.

I think that's a safe assumption :)

>> Would you prefer that we don't find it until much later,  like  
>> after a
>> release?  Or if we do find it, just accept it to avoid  having to  
>> commit
>> "political suicide" by pointing it out to the  contributor?
> It'd be fine as long as nothing bad is found, or the cases flagged by
> the black box oracle are actual issues. I'm trying to view it from a
> worst-case perspective.

We can determine them, because the "oracle" is a really fancy grep,  
which just shows files that have similarity.  We then have to verify.

> The trouble would start if we end up having a false positive.
> How do we figure out that we have a false positive, without either
> access to say, the database, the source code of the oracle, the  
> complete
> legal history of some bit of proprietary code including the merges,
> transactions, copyright transfers and relicensing operations, etc?

Ah - yes.  That's they key.  We would only compare against code that  
we were comfortable having someone look at.  Specifically, I'm afraid  
of Sun code accidentally getting into our codebase, because the stuff  
is so prevalent in the Java community.  It's in every Sun J2SE  

> Such a 'discovery' process could take quite a bit of time, provided  
> all
> parties involved (including the makers of the black box oracle) would
> have any business interest in participating (in absence of an actual
> legal case). If, say, Microsoft takes their time to talk to Apache  
> about
> the legal history of Microsoft's VM, (what'd be in it for Microsoft,
> after all? :) where does it leave a contribution that'd be flagged as
> potentially infringing on Microsoft's code?
> I'd guesstimate a resolution could take a few years, as a worst  
> case. Is
> any contribution that stays in limbo for a few years going to be
> relevant after a claim is showed to be false after a few years?
> That's where the 'political suicide' scenario I mentioned comes in, as
> it could force us to act as an arbiter in determining how trustworthy
> either IBM, Black Duck or Microsoft are, based on little more than a
> black box. Not a position I'd like to find myself in, in particular if
> it all turns out to be just a software glitch.[1] :)

I see.  I think that there are some assumptions here that you made,  
that I wasn't ever thinking of.  We need to have the code we compare  
against accessible by someone in the community willing to look at  
it.  We have people that don't care if they glimpse JRL code (and by  
the way things are working out, Sun won't care if people are exposed  
to JRL code as long as they don't make copies...)

So that's the kind of things we want to compare against : open source  
(kaffe, GNU classpath, etc) and code like Sun's for which there are  
no limits on retention after exposure.

>> If we find code stolen from *any* copyright holder, we will   
>> definitely
>> reject the code.
> +1
>> Because there is a complete  implementation under a
>> non-opensource license that has been very,  very widely  
>> distributed, it
>> behooves us to take what steps we can to  ensure that we don't
>> accidentally incorporate it into our codebase.
> +1, too.
> We just need to make sure that the steps we take are equally  
> transparent
> to everyone involved (and the outsiders), as the rest of the  
> process is,
> in my opinion. A black box oracle doesn't have its place in such a  
> process.


> cheers,
> dalibor topic
> [1] Yeah, I know, I'm assuming that the Black Duck software is not
> perfect and error free without having ever seen it. It's a worst case
> scenario, though, so I am taking some freedoms with things that can go
> wrong. :)

Freedom (TM)



Geir Magnusson Jr                                  +1-203-665-6437

View raw message