lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Herbert Roitblat" <>
Subject Re: Stemming and Wildcard Queries
Date Thu, 20 May 2010 20:48:05 GMT
At a general level, we have found that stemming during indexing is not 
advisable.  Sometimes users want the exact form and if you have removed the 
exact form during indexing, obviously, you cannot provide that.  Rather, we 
have found that stemming during search is more useful, or maybe it should be 
called anti-stemming.  For any given input for which the user wants to stem, 
we could derive the variations during the query processing.  E.g., plan can 
be expanded to include plans, planning, planned, etc.

In our application we provide a feature that is sometimes called a word 
wheel.  When someone enters plan in this tool, we show all of the words in 
the index that start with plan. Here are some of the related words:

Just a thought.

----- Original Message ----- 
From: "Ivan Provalov" <>
To: <>
Sent: Thursday, May 20, 2010 1:16 PM
Subject: Stemming and Wildcard Queries

> Is there a good way to combine the wildcard queries and stemming?
> As is, the field which is stemmed at index time, won't work with some 
> wildcard queries.
> We were thinking to create two separate index fields - one stemmed, one 
> non-stemmed, but we are having issues with our SpanNear queries (they 
> require the same field).
> We thought to try combining the stemmed and non-stemmed terms in the same 
> field, but we are concerned about the stats being skewed as a result of 
> this (especially for the TermVector stats).  Can overloading the 
> non-stemmed field with stemmed terms cause any issues with the TermVector?
> Any suggestions?
> Ivan Provalov
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message