lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Lav <a...@pobox.com>
Subject boosts for unstemmed matches (was Re: If you could have one feature in Lucene...)
Date Wed, 24 Feb 2010 21:20:54 GMT
On Wed, Feb 24, 2010 at 10:18:27PM +0200, Avi Rosenschein wrote:
> On Wed, Feb 24, 2010 at 3:42 PM, Grant Ingersoll <gsingers@apache.org>wrote:
> 
> > What would it be?
> >
> 
> For scoring to take into account the non-analyzed token stream.
> 
> That is, if a field is analyzed (stemmed, lowercased, maybe even stop words
> removed), that is fine for indexing. But tokens in the query matching the
> original form could still get a higher score than those that only match when
> analyzed.

You can get some of that effect by indexing stemmed and unstemmed
forms, and letting IDF boost unstemmed results.  (I picked this
idea up from http://lingpipe-blog.com/2007/03/21/to-stem-or-not-to-stem/)

> Also, this would maybe allow a flexible, run-time, decision of what
> analyzers to include. For example, I might want stemming turned on for
> normal search, but not for a PhraseQuery.

That's harder - different field names for the different analyses might
work, but not for run-time decisions.  I think the way Sun's Minion does
it is morphologically-based query expansion (see http://blogs.sun.com/searchguy/entry/lightweight_morphology_vs_stemming),
and you might be able to
implement that via query rewriting.

     Aaron Lav (asl2@pobox.com)

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message