lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravi" <r...@htinc.com>
Subject Stopwords in phrases
Date Tue, 21 Dec 2004 15:41:04 GMT
 I want to be able to use stopwords in exact phrase searches. I have
looked at Nutch and used the same approach (replace common words with
n-grams. Look at net.nutch.analysis.CommonGrams). 
  So if "to","be","or" and "not" are stop words, for the string "to be
or not to be", the analyzer produces the following tokens

[to-be, to-be-or, to-be-or-not, to-be-or-not-to, to-be-or-not-to-be,
be-or, be-or-not, be-or-not-to, be-or-not-to-be, or-not, or-not-to,
or-not-to-be, not-to, not-to-be, to-be]

  This is exactly what I wanted from the analyzer during indexing.
  But I'm having a problem with the search. 
 when I do a search on "not to be" the analyzer is converting my search
into 
  content:"not-to not-to-be to-be" because the analyzer produces the
tokens "not-to","not-to-be","to-be"

  I'm getting 0 results on this as there is no token "not-to not-to-be
to-be" in the index. 

  I want just "not-to-be" from the analyzer during the search so when I
search on "not to be" I will get the document which has "not-to-be" as a
token. 

   How can I use the same analyzer to get different results in indexing
and searching? 

Thanks in advance,
Ravi. 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message