incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Vesse <>
Subject Re: [DISCUSS] Proposal for a Black Duck POC
Date Mon, 31 Mar 2014 09:13:22 GMT

Black Duck software certainly have a useful platform though it would be
useful to know what they are considering using for the POC.

Personally I¹ve used their Protex software and I can state from experience
that it is quite a time consuming and thankless process to work through IP
Clearance with it having done this several times over the past couple of
years with pieces of code developed at my employer and then open sourced.

I would certainly recommend trying a POC but I¹m not sure it is
necessarily something you¹d want to impose on all incoming projects in the
long term.

Some info on Protex:

My main concerns are that Protex while very useful is somewhat dumb
primarily due to the quality of its knowledge base.  For those who aren¹t
aware essentially the tool scans the code looking for files that have
³signatures² that match other open source/proprietary code in the
knowledge base.  The open source code is scraped from all sorts of public
sites like SourceForge, GitHub, BitBucket etc.  For each match that occurs
someone has to review the match and then they can indicate whether to
exclude that match I.e. it was a false positive or to accept that match
and attribute it appropriately.

This is great in principle because it easily spots obvious plagiarism when
it occurs.  The problem from my point of view is that the false positive
rate is very high and then you have to go through all the matches and
manually state whether they are valid/invalid.  This ends up being very
time consuming because for each match on your code you have to review all
the possible matches to see if there actually is a genuine match and if
not then go through a process of telling the tool

This is where the knowledge base starts to hurt you, there are lots of
projects out there which check in everything including things like
auto-generated IDE project files, build tool reports, VCS ignore files etc
which tend to have very high similarity and get flagged up as false
positives constantly.  Ideally Apache projects won¹t themselves be
checking these things in so the chances of these getting flagged should be

As a more practical example I had a recent case where I was working
through an analysis on some Hadoop related code my company is considering
open sourcing which is primarily a collection of implementations of
InputFormat and OutputFormat.  A good number of our code files were
flagged as potential matches and when reviewed the only similarity was
that we had the same set of imports as many other Hadoop ecosystem
projects.  This is of course exacerbated by the fact that many developers
use IDEs which organise their imports!  So I had to spend several hours
checking each file and ticking boxes in Protex to say that this was
original code and not plagiarised.

I would definitely recommend carrying out a POC and seeing what people
make of it but be aware that it can be a painful and time consuming

If the tool is indeed Protex then being familiar with it I would be
willing to help out with a POC



On 31/03/2014 01:52, "Roman Shaposhnik" <> wrote:

>a few recent discussions around IP management
>in the Incubator have lead to an interesting dialogue
>between the fine folks from Black Duck Software and
>yours truly.
>The main ides here is that, perhaps, Black Duck
>services can be as helpful to the open source
>communities as the ones provided by the likes
>of free Coverity scans.
>Of course, the best way to assess how much value
>this potential collaboration can bring to the ASF
>projects is to run a POC and see for ourselves.
>Also, I think, if there's a single place in the ASF that
>would benefit the most from additional pair of eyes
>looking for potential IP related issues that would be
>incubating projects.
>Hence, I'd like to propose that we do just that: run
>a POC on a couple of projects identified by the
>Incubator community. I will be the central contact
>person for this, but I'd very much appreciate folks
>volunteering to help.
>This thread aims at three things:
>    1. collecting general feedback on whether
>    this is a good or bad idea.
>    2. folks volunteering to help with the POC
>    (if needed).
>    3. folk suggesting Incubator projects
>    for the first POC that we run with Black Duck.
>Please share your thoughts and feedback!
>To unsubscribe, e-mail:
>For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message