lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Lav <>
Subject boosts for unstemmed matches (was Re: If you could have one feature in Lucene...)
Date Wed, 24 Feb 2010 21:20:54 GMT
On Wed, Feb 24, 2010 at 10:18:27PM +0200, Avi Rosenschein wrote:
> On Wed, Feb 24, 2010 at 3:42 PM, Grant Ingersoll <>wrote:
> > What would it be?
> >
> For scoring to take into account the non-analyzed token stream.
> That is, if a field is analyzed (stemmed, lowercased, maybe even stop words
> removed), that is fine for indexing. But tokens in the query matching the
> original form could still get a higher score than those that only match when
> analyzed.

You can get some of that effect by indexing stemmed and unstemmed
forms, and letting IDF boost unstemmed results.  (I picked this
idea up from

> Also, this would maybe allow a flexible, run-time, decision of what
> analyzers to include. For example, I might want stemming turned on for
> normal search, but not for a PhraseQuery.

That's harder - different field names for the different analyses might
work, but not for run-time decisions.  I think the way Sun's Minion does
it is morphologically-based query expansion (see,
and you might be able to
implement that via query rewriting.

     Aaron Lav (

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message