lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Moving SweetSpotSimilarity out of contrib
Date Wed, 03 Sep 2008 19:00:30 GMT
Another important driver is the "out-of-the-box experience".

It's crucial that Lucene has good starting defaults for everything
because many developers will stick with these defaults and won't
discover the wiki page that says you need to do X, Y and Z to get
better relevance, indexing speed, searching speed, etc.  This then
makes Lucene look bad, not only to these Lucene users but then also to
the end users who use their apps that say "Powered by Lucene".

It also affects Lucene's adoption/growth over time: when a potential
new user is just "trying Lucene out" we want our defaults to shine
because those new users will walk away if Lucene doesn't compare well
to other engines that are well-tuned out-of-the-box.

I remember a while back we discussed an article comparing performance
of various search engines and we were disappointed that the author
didn't do X, Y and Z to let Lucene compete fairly.  If we had good
defaults that wouldn't have happened (or, at least to a lesser

Obviously we can't default everything perfectly since at some point
there are hard tradeoffs to be made and every app is different, but if
SweetSpotSimilarity really gives better relevance for many/most apps,
and doesn't have any downsides (I haven't looked closely myself), I
think we should get it into core?

You know... it's almost like we need a "standard distro" (drawing
analogy to Linux) for Lucene, which would be the core plus cherry-pick
certain important contrib modules (highlighter, SweetSpotSimilarity,
snowball, spellchecker, etc.) and bundle them together.  See,
highlighting is obviously well "decoupled" from Lucene's core, so it
should remain in contrib, yet is also cleary a very important function
in nearly every search engine.


Mark Miller wrote:

> I would agree with you if I was wrong about the contrib/core  
> attention thing, but I don't think I am. It seems as if you have  
> been arguing that contrib is really just an extension of core, on  
> par with core, but just in different libs, and to keep core lean and  
> mean, anything not needed in core shouldn't be there - sounds like  
> an idea I could get behind, but seems to ignore the reality:
> The user/dev focus definitely seems to be on core. Some of contrib  
> is a graveyard in terms of dev and use I think. I think its still  
> entangled in its "sandbox" roots.
> Contrib lacks many requirements of core code - it can be java 1.5,  
> it doesn't have to be backward compatible, etc. Putting something in  
> core ensures its treated as a Lucene first class citizen, stuff in  
> contrib is not held to such strict standards.
> Even down to the people working on the code, there is a lower bar to  
> become a contrib commiter than a full core committer (see my contrib  
> committer status <g>).
> Its not that I don't like what you propose, but I don't buy it as  
> very viable the way things are now. IMO we would need to do some  
> work to make it a reality. It can be said thats the way it is, but  
> my view of things doesnt jive with it.
> I may have mis written "generally useful". What I meant was, if the  
> sweet spot sim is better than the default sim, but a bit harder to  
> use because of config, perhaps it is "core" enough to go there, as  
> often it may be better to use. Again, I fully believe it would get  
> more attention and be 'better' maintained. I did not mean to set the  
> bar at "generally useful" and I apologize for my imprecise language  
> (one of my many faults).
>> I think that's the wrong question to ask.  I would rather ask the  
>> question "Is X decoupled enough from Lucene internals that it can  
>> be a contrib?"  Things like IndexWriter, IndexReader, Document and  
>> TokenStream really need to be "core" ... but things like the  
>> QueryParser, and most of our analyzers don't.  Having lots of  
>> loosely coupled mini-libraries that respect good API boundaries  
>> seems more reusable and generally saner then "all of this code is  
>> useful and lots of people wnat it so throw it into the kitchen sink"
>> We don't need to go hog wild gutting things out of the core ... but  
>> i don't think we should be adding new things to the core just  
>> becuase they are "generally useful".
>> -Hoss
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message