lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: How to promote an unstemmed match over a stemmed match in an index that's stemmed...
Date Mon, 11 Feb 2008 18:58:59 GMT
You have to bet a bit clever. You can certainly inject the original with an
increment of 0. See SynonymAnalyzer in Lucene In Action. This will not
break phrase queries since your two tokens occupy the same position.

But you'll have to do something like add a $ to the original at index time.
That way, for exact matches you can search on olive$, boosted however
you  want. When you want the stemmed version you can search for olive.
Or you could add a clause with the unstemmed version boosted. Or
something like that <G>.... Note that whether you add the $ to the stemmed
or unstemmed version is up to you.......

Watch what analyzer you use to be sure it doesn't strip out the special
symbol....

Best
Erick

On Feb 11, 2008 12:56 PM, Michael Stoppelman <stopman@gmail.com> wrote:

> Hi all,
> I've got an index with tokens that are stemmed. Sometimes I really need to
> boost the unstemmed
> version of a query word to get the most relevant documents.
>
> Example:
> Query: [olives].
>
> I don't want to match documents with the words: oliver, oliver's, etc...
>
> Since I'm stemming when creating the index is there a way to store both
> versions (stemmed/unstemmed) with
> setIncrementPosition()? Is that the correct way to deal with this? I was
> reading old archives and this didn't seem
> to be a great way decision since it breaks PhraseQuery [1].
>
> It seems like it would be useful if at query scoring time if I could see
> the
> original string values of the tokens in this case
> at least.
>
> Thanks in advance,
>
> -M
>
> [1]
> http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg07416.html
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message