lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatu Saloranta <>
Subject Re: interesting phrase query issue
Date Thu, 17 Jul 2003 13:53:06 GMT
On Thursday 17 July 2003 07:20, greg wrote:
> I have several document sections that are being indexed via the
> StandardAnalyzer.  One of these documents has the line "access, the
> manager".  When searching for the phrase "access manager", this document is
> being returned.  I understand why (at least i think i do), because a stop
> word is "the" and the "," is being removed by the tokenizer, my question is
> is there any way I can avoid having this returned in the results?  My
> thoughts were to create a new analyzer that indexes the word "the" (blick
> to many of those), or index the "," in some way (also not good).  Any
> suggestions?

You can also replace all stop words with "dummy" token ("" might be an ok 
candidate?). That would be similar to indexing "the" (which probably is  
better idea than indexing ",").

I'm planning to do something similar for paragraph breaks (in case of plain 
text, double linefeed, for HTML <p> etc), to prevent similar problems.

-+ Tatu +-

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message