lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera" <ser...@gmail.com>
Subject Re: Moving SweetSpotSimilarity out of contrib
Date Tue, 02 Sep 2008 13:20:44 GMT
>From a legal standpoint, whenever we need to use open-source code, somebody
has to inspect the code and 'approve' it. This inspection makes sure there's
no use of 3rd party libraries, to which we'd need to get open-source
clearance as well.

This process was done for Lucene core, but not for contrib, in my company.
AFAIU, this process should be done by a company if it wants to (usually
mandatory when you integrate open-source code in your products). Therefore I
don't think the Lucene community should be concerned with this.

The only thing that the community can do is to move as much as possible to
the core, so that if a company inspects the code, it will cover as much as
possible. Of course, this may sound too 'broad' of a statement and I
definitely don't think everything should belong to 'core'. My understanding
is that the 'contrib' packages include 3rd party libraries (like Snowball),
while there are packages which do not require and 3rd party libs (like
SweetSpotSimiliarity). For those that require 3rd party libs, it makes sense
to leave them in contrib. For those that don't, per-request, it might make
sense to move them to 'core' in order to encourage people to use them.
That's why I was asking if it's a problem to move SweetSpot to 'core'.

As for your questions on SweetSpot, from what I understand in the code, an
application should configure it with different values, depnding on the TF
computation method it wants to use (hyperbolic or baseline). The default
implementation in SweetSpot for tf() is to use the baseline method, while an
application can extend SweetSpot and override tf() to use the hyperbolic
one.
An application can also configure the length norm parameters for different
fields.

>From what I read, the code is well documented. Perhaps Doron can some
high-level documentation on what's the benefit of each tf() computation
method, or give some references. But the defaults seem to make sense, so an
application can definitely start with the default (if it wants to).

Shai

On Tue, Sep 2, 2008 at 2:34 PM, Grant Ingersoll <gsingers@apache.org> wrote:

>
> On Sep 2, 2008, at 6:07 AM, Shai Erera wrote:
>
> Hi,
>
> Following Doron's quality work enhancements in TREC 2007 (
> http://wiki.apache.org/lucene-java/TREC_2007_Million_Queries_Track_-_IBM_Haifa_Team),
> I was wondering if it's possible to move the SweetSpotSimilarity to Lucene's
> main code stream (out of "contrib" that is).
> It shows significant improvement over the default similarity.
>
>
> My understanding is it requires a bit of tuning, right?  I'd want to make
> sure people have the right information to use it intelligently, but
> otherwise, it seems reasonable.
>
> I'm not suggesting to replace the DefaultSimilarity (as the default) with
> SweetSpot, but just expose SweetSpot as part of Lucene's core. It will help
> me use it, since I cannot use the contrib packages easily in my environment
> (legal issues), but can use Lucene's core more freely.
>
>
> This strikes me as really odd. The contrib modules are released under the
> exact same terms as the core, but heh, I'm not a lawyer...  Is there
> anything you think we should be concerned with?
>
> -Grant
>
>

Mime
View raw message