lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <>
Subject Re: Moving SweetSpotSimilarity out of contrib
Date Wed, 03 Sep 2008 13:51:23 GMT
Not tried SweetSpot so can't comment on worthiness of moving to core but agree with the principle
that we can't let the hassles of a company's "due diligence" testing dictate the shape of
core vs contrib.

For anyone concerned with the overhead of doing these checks a company/product of potential
interest is "Black Duck".
I don't work for them and don't offer any endorsement but simply point them out as something
you might want to take a look at.


----- Original Message ----
From: Nadav Har'El <>
Sent: Wednesday, 3 September, 2008 13:21:34
Subject: Re: Moving SweetSpotSimilarity out of contrib

On Tue, Sep 02, 2008, Chris Hostetter wrote about "Re: Moving SweetSpotSimilarity out of contrib":
> : >From a legal standpoint, whenever we need to use open-source code, somebody
> : has to inspect the code and 'approve' it. This inspection makes sure there's
> : no use of 3rd party libraries, to which we'd need to get open-source
> : clearance as well.
> You should talk to whomever you need to talk to at your company about 
> revising the appraoch you are taking -- the core vs contrib distinction in 
> Lucene-Java is one of our own making that is completly artificial.  With 
> Lucene 2.4 we could decide to split what is currently known as the "core" 
> into 27 different directories, none of which are called core, and all of 
> which have an interdependency on eachother.  We're not likely to, but we 
> could -- and then where woud your company be?

I can't really defend the lawyers (sometimes you get the feeling that they
are out to slow you down, rather than help you :( ), but let me try to explain
where this sort of thinking comes from, because I think it is actually quite

Lucene makes the claim that it has the "apache license", so that any company
can (to make a long story short) use this code. But when a company sets out
to use Lucene, can it take this claim at face value? After all, what happens
if somebody steals some proprietary code and puts it up on the web claiming it
has the apache license - does it give the users of that stolen code any
rights? Of course not, because the rights weren't the distributor's to give
out in the first place.

So it is quite natural that when a company wants to use use some open-source
code it doesn't take the license at face value, and rather does some "due
diligance" to verify that the people who published this code really owned
the rights to it. E.g., the company lawyers might want to do some background
checks on the committers, look at the project's history (e.g., that it doesn't
have some "out of the blue" donations from vague sources), check the code and
comments for suspicious strings, patterns, and so on.

When you need to inspect the code, naturally you need to decide what you
inspect. This particular company chose to inspect only the Lucene core,
perhaps because it is smaller, has fewer contributors, and has the vast
majority of what most Lucene users need. Inspecting all the contrib - with
all its foreign language analyzers, stuff like gdata and other rarely used
stuff - may be too hard for them. But then, the question I would ask is -
why not inspect the core *and* the few contribs that interest you? For
example, SweetSpotSimilarity (which you need) and other generally useful
stuff like Highlighter and SnowballAnalyzer.

> Doing this would actually be a complete reversal of the goals discussed in 
> the near past:  increasing our use of the contrib structure for new 
> features that aren't inherently tied to the "guts" of Lucene.  The goal 
> being to keep the "core" jar as small as possible for people who want to 
> develop apps with a small foot print.

I agree that this is an important goal.

> At one point there was even talk of refactoring additional code out of the 
> core and into a contrib (this was already done with some analyzers when 
> Lucene became a TLP)

Nadav Har'El                        |      Wednesday, Sep  3 2008, 3 Elul 5768
IBM Haifa Research Lab              |-----------------------------------------
                                    |Promises are like babies: fun to make,           |but hell to deliver.

To unsubscribe, e-mail:
For additional commands, e-mail:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message