lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "J. Delgado" <joaquin.delg...@gmail.com>
Subject Re: Moving SweetSpotSimilarity out of contrib
Date Sat, 06 Sep 2008 15:55:13 GMT
I cannot agree more with Otis. Its all about exposure! Without references
from main JavaDocs, some cool things in contrib just remain in obscurity.

-- Joaquin

On Sat, Sep 6, 2008 at 1:08 AM, Otis Gospodnetic <otis_gospodnetic@yahoo.com
> wrote:

> Regarding SSS (and any other contrib visibility).
> Perhaps we should get into habit of referencing contrib goodies from highly
> visible (to developers) spots (no pun intended), like Javadocs.  Concretely,
> if SSS is so good or if it is simply one possible alternative Similarity
> that's available and that we (Lucene developers) know about, why are we not
> mentioning it in Javadocs for (Default)Similarity?
>
>
>
> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/org/apache/lucene/search/Similarity.html
>
> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/org/apache/lucene/search/DefaultSimilarity.html
>
> Javadocs have a lot of visibility, esp. in modern IDEs.  We can also have
> this mentioned on the Wiki, but Wiki is documentation that I think most
> people don't really like to read.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
> > From: Michael McCandless <lucene@mikemccandless.com>
> > To: java-dev@lucene.apache.org
> > Sent: Friday, September 5, 2008 6:41:48 AM
> > Subject: Re: Moving SweetSpotSimilarity out of contrib
> >
> >
> > Chris Hostetter wrote:
> >
> > > : Another important driver is the "out-of-the-box experience".
> > >
> > > I honestly have no idea what an OOTB experience for Lucene-Java
> > > means ...
> > > For Solr i understand, For Nutch i understand ... for a java
> > > library????
> >
> > Well... even though it's a "java library", Lucene still has many
> > defaults.
> >
> > Sure, Solr has even more, so this is important for Solr too.
> >
> > Most non-Solr apps built on Lucene will simply use Lucene's defaults,
> > for lack of knowing any better.
> >
> > How well such apps then work is what I'm calling the OOTB experience
> > for Lucene, and I think it's well-defined and important.
> >
> > Especially spooky is when a publication does an eval of search
> > libraries because typically they will eval only the OOTB experience and
> > won't go looking on our wiki to discover all the tricks.
> >
> > With IndexWriter we default to flushing by RAM usage (16 MB) not by
> > buffered doc count, to ConcurrentMergeScheduler, to
> > LogByteSizeMergePolicy, to compound file format, mergeFactor is 10,
> > etc.
> >
> > IndexSearcher (and also IndexWriter, for lengthNorm) uses
> > Similarity.getDefault().
> >
> > QueryParser uses a number of defaults when translating the end user's
> > search text into all sorts of Query instances.
> >
> > In 2.3 we made great improvements to OOTB indexing speed, and that's
> > important.
> >
> > I think making improvements to OOTB relevance is also important, but I
> > agree this is much harder to do "in general" since there are so many
> > differences between the content in apps.
> >
> > That all being said... I also agree (on closer inspection) it's not
> > cut and dry that SSS is a good choice for default (what would be the
> > right default for its "curve"?).
> >
> > If other OOTB relevance improvements surface with time (eg a good way
> > to do passage scoring/retrieval or proximity scoring or lexical
> > affinity) then we should strongly consider them.  Such things always
> > come with a performance cost, though, so it'll be an interesting
> > discussion...
> >
> > > Butthen we get into that back-compat concern issue.
> >
> > Well...is Lucene's precise scoring formula guaranteed not to change
> > between releases?  I assume and hope not.
> >
> > Just like with indexing, where the precise choice of when committing
> > and merging and flushing happens was never "promised", that lack of
> > API promise gave us the freedom to drastically improve the OOTB
> > indexing speed without breaking any promises.  We need to keep that
> > same freedom on the search side.
> >
> > From our last discussion on back compat, our most powerful weapon is
> > to NOT make promises when they aren't necessary or could limit future
> > back compat.
> >
> > And, if we have a back compat situation that's holding back Lucene's
> > OOTB adoption by new users, we should think hard about switching the
> > default to favor new users and making an option to quickly get back to
> > the old behavior to accomodate existing users.  The recent bug fixes
> > to StandardTokenizer are such examples.
> >
> > Mike
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Mime
View raw message