lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Provalov <iprov...@yahoo.com>
Subject Re: Stemming and Wildcard Queries
Date Fri, 21 May 2010 13:04:19 GMT
Thanks, everyone!

--- On Thu, 5/20/10, Herbert Roitblat <herb@orcatec.com> wrote:

> From: Herbert Roitblat <herb@orcatec.com>
> Subject: Re: Stemming and Wildcard Queries
> To: java-user@lucene.apache.org
> Date: Thursday, May 20, 2010, 4:48 PM
> At a general level, we have found
> that stemming during indexing is not advisable. 
> Sometimes users want the exact form and if you have removed
> the exact form during indexing, obviously, you cannot
> provide that.  Rather, we have found that stemming
> during search is more useful, or maybe it should be called
> anti-stemming.  For any given input for which the user
> wants to stem, we could derive the variations during the
> query processing.  E.g., plan can be expanded to
> include plans, planning, planned, etc.
> 
> In our application we provide a feature that is sometimes
> called a word wheel.  When someone enters plan in this
> tool, we show all of the words in the index that start with
> plan. Here are some of the related words:
> plan
> plane
> planes
> planet
> planificaci
> planned
> plannedoutages.xls
> planner
> planners
> 
> Just a thought.
> Herb
> 
> ----- Original Message ----- From: "Ivan Provalov" <iprovalo@yahoo.com>
> To: <java-user@lucene.apache.org>
> Sent: Thursday, May 20, 2010 1:16 PM
> Subject: Stemming and Wildcard Queries
> 
> 
> > Is there a good way to combine the wildcard queries
> and stemming?
> > 
> > As is, the field which is stemmed at index time, won't
> work with some wildcard queries.
> > 
> > We were thinking to create two separate index fields -
> one stemmed, one non-stemmed, but we are having issues with
> our SpanNear queries (they require the same field).
> > 
> > We thought to try combining the stemmed and
> non-stemmed terms in the same field, but we are concerned
> about the stats being skewed as a result of this (especially
> for the TermVector stats).  Can overloading the
> non-stemmed field with stemmed terms cause any issues with
> the TermVector?
> > 
> > Any suggestions?
> > 
> > Ivan Provalov
> > 
> > 
> > 
> > 
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > 
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message