lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Gilmartin <>
Subject Re: issues with wildcard search and snowball english analyzer
Date Thu, 24 Jul 2008 23:25:55 GMT
--- On Thu, 7/24/08, JBTech <> wrote:

> Is there a way to avoid stemming in certain cases?

As a general rule, make the query intelligent and not the index. Therefore, index your text
verbatim. Small changes like changing terms to lowercase and removing possessives are fine.
You now have an index upon which you can make intelligent queries.

An intelligent query requires keeping track of several collections of term-to-term(s) mappings.
For example, stemmed-term to verbatim-term(s). Now, convert the users search for "elephant
is a big animal" into something akin to 

( (elephant^10) OR (A) OR (B) ) AND
( (big^10) OR (C) ) AND
( (animal^10) OR (D) )

Where A and B are other terms with the same stemming as elephant, C is another term with the
same stemming as big, and D is a another term with the same stemming as animal. Adding the
boost ensures that a verbatim match pushes the document's rank higher and so ensure that what
the user asked for is closer to the top.

This basic idea of making the queries more intelligent by broadening them and boosting term
weights gives you a lot of control over the query and how results are ranked. The same control
is not possible by making the index more intelligent.

Don't worry about Lucene's performance with complex queries. My experience is that it is very

And to answer your specific question, search for "e*t" will work as is.

-- Andrew

View raw message