lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From JBTech <jb4t...@gmail.com>
Subject Re: issues with wildcard search and snowball english analyzer
Date Fri, 25 Jul 2008 14:04:02 GMT

Hi Andrew,
Thanks for your quick reply.
I tried with e*t and that did not return any results.
I am using Lucene 2.2.
The full word elephant returned one hit as I am using the same analayzer for
indexing and searching.
I uploaded the java class I used for testing this.
Thanks
JB

Andrew Gilmartin-2 wrote:
> 
> --- On Thu, 7/24/08, JBTech <jb4tech@gmail.com> wrote:
> 
>> Is there a way to avoid stemming in certain cases?
> 
> As a general rule, make the query intelligent and not the index.
> Therefore, index your text verbatim. Small changes like changing terms to
> lowercase and removing possessives are fine. You now have an index upon
> which you can make intelligent queries.
> 
> An intelligent query requires keeping track of several collections of
> term-to-term(s) mappings. For example, stemmed-term to verbatim-term(s).
> Now, convert the users search for "elephant is a big animal" into
> something akin to 
> 
> ( (elephant^10) OR (A) OR (B) ) AND
> ( (big^10) OR (C) ) AND
> ( (animal^10) OR (D) )
> 
> Where A and B are other terms with the same stemming as elephant, C is
> another term with the same stemming as big, and D is a another term with
> the same stemming as animal. Adding the boost ensures that a verbatim
> match pushes the document's rank higher and so ensure that what the user
> asked for is closer to the top.
> 
> This basic idea of making the queries more intelligent by broadening them
> and boosting term weights gives you a lot of control over the query and
> how results are ranked. The same control is not possible by making the
> index more intelligent.
> 
> Don't worry about Lucene's performance with complex queries. My experience
> is that it is very fast.
> 
> And to answer your specific question, search for "e*t" will work as is.
> 
> -- Andrew
> 
> 
> 
> 
> 
http://www.nabble.com/file/p18652365/Testing.java Testing.java 
-- 
View this message in context: http://www.nabble.com/issues-with-wildcard-search-and-snowball-english-analyzer-tp18641947p18652365.html
Sent from the Lucene - General mailing list archive at Nabble.com.


Mime
View raw message