lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From duidu...@web.de
Subject Re: Quotes dependent StopWords removal
Date Wed, 16 Aug 2006 09:25:33 GMT
Hello Sameer,

what about this:

- during indexing, use the StandardAnalyzer without stopwords
- during the search, use 2 different Analyzers - one with and one without stopwords. Thereyby,
you look first whether the user
  has typed in quotes inside her query String.
  # If so, look whether there are stopwords between the quotes
    * in the case there is a stopword between quotes, use the Analyzer without stopwords
    * in the case there is no stopword between quotes, use the one with stopwords
  # If not, use the one with stopwords anyway

...the lack on this approach is that when a user mix up stopwords queries with and without
quotes, you can not decide such easily-
maybe there a solution can be to modify the analyzer stopword lists on the fly...then the
last problem left is when the user types
a specific stopword twice - with and without quotes..so maybe you can live in this situation
to use the Analyzer without stopwords -
depending on your scenario, it could be a good compromise...or search n times - but this wouldn't
straight forward also ;)


greetz

Christian



Sameer Maggon schrieb:
> Currently, in my application (that uses Lucene), I am using a Porter + StandardAnalyzer
(with stop words).
> 
> 
> 
> I would like to do the following:
> 
> When the user performs a search, the analyzer should remove the "stopwords" only if the
stop word is not present in quotes. If the stop word is present in quotes, I don't want the
stop word to be removed by the analyzer.
> 
> 
> 
> For e.g.
> 
> 
> 
> "no dress code" - should not remove "no"  as it's present in quotes.
> 
> 
> 
> shirts with trousers - should remove "with" as a stop word.
> 
> 
> 
> I have been trying to do this with Lucene, but have not found a straight forward way
of doing it. I have been digging in Lucene mail archives, but it seems like there is no easy
way to do this apart from extending / modifying the QueryParser. In some sense, it is similar
to the issue discussed in:
> 
> 
> 
> http://www.gossamer-threads.com/lists/lucene/java-user/38946
> 
> 
> 
> Is there any way I can avoid subclassing QueryParser ?
> 
> 
> 
> Thanks,
> 
> Sameer Maggon.
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message