lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: I have a question about phrase query with stop words
Date Fri, 13 Apr 2007 20:04:13 GMT

On Friday 13 April 2007 04:04, Erick Erickson wrote:
> As I understand it, there really is no "space indicator". I think of it
> as replacing the stop word with a space, which is then discarded.

You can replace all stop words by your own special term value
to have space indicator.

It is also possible to index nothing at a particular position, for example
at the position of a stop word. This gives a "gap" in the index,
see below.
 
> so, you're indexing 'you find answer', and both your searches are
> looking for 'you find answer',  the stop words are just gone as though
> they never were. So both queries match.
> 
> But I've been wrong before <G>...
> 
> I can't really speak to the highlighter question, so I'll let someone
> more knowledgeable pipe up.
> 
> Erick
> 
> On 4/12/07, Bill Taylor <wataylor@alum.mit.edu> wrote:
> >
> > I found some discussions of this question from back in 2003, but that was
> > many updates ago.
> >
> > I have built an index using the standard stop analyser which uses the
> > standard list of stop words.  "will" and :the" are stop words.
> >
> > As I understand analyzers and phrase queries, when I search for
> >
> > you will find the answer
> >
> > using the default slop of 0, I should find any pattern like
> >
> > you <any stop word> find <any stop word> answer
> >
> > because the analyzer replaces "will" and "the" in the query with a space
> > indicator as it did when analyzing the original input text.  Instead, I
> > find
> > phrases such as
> >
> > you find an answer
> >
> > "an" is a stop work, so matching "find an answer" is as expected, but
> > there
> > is no stop word between "you" and "find" in the original input string.  I
> > do
> > not see why "you find an answer" matches.
> >
> > What am I doing wrong?

The problem may be that you expect a gap in the index.
When there is a gap in the index, it is also necessary to adapt
the analyzer used for the phrase query to query for a gap.
I don't know whether PhraseQuery can handle such an analyzer.

To have a gap in the index, you need to change your analyzer
to add a gap for a stop word. This can be done by changing the
position increment when a stop word is encountered, see
Token.setPositionIncrement(). Iirc you need to make a variation
on StopFilter for this.

Regards,
Paul Elschot



> >
> >
> > Also, when I try to highlight after searching for a phrase, the
> > highlighter
> > highlights individual words wherever it finds them in the input text.  The
> > documentation suggests that if I use the right scoring system, I will
> > highlight only long strings of adjacent tokens which are found in the
> > phrase, but I am not sure how to do that.
> >
> > If necessary, I will paste in samples of my code for creating the indexes
> > and doing the search.
> >
> >
> > Thanks.
> >
> > Bill Taylor
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message