lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: A question regarding the setSlop method of class PhraseQuery (Lucene version 3.0.1)
Date Mon, 28 Jun 2010 18:56:04 GMT
No, I don't think so. The critical bit is that the indexed text
does NOT contain the word "formal".  So searching for
any phrase that DOES contain "formal" should fail no matter
what the slop.

Phrase queries are something like "find all the words in this
search string, ignoring some number of intervening tokens not in the
search string". There's nothing in there about "find only some of the
words in the search string"....

I'm guessing that the original post had a typo in the success case,
because it's contradicted by "a peng's" second post. It's always
possible that I'm experiencing a brain short......

Best
Erick

On Mon, Jun 28, 2010 at 11:32 AM, tarun sapra <t.sapra97@gmail.com> wrote:

> Hey Erick
>
> Thanks mate!
>
> So I guess my explanation in the mail chain above was correct!
>
> On Mon, Jun 28, 2010 at 6:20 AM, Erick Erickson <erickerickson@gmail.com
> >wrote:
>
> > I think you're misunderstanding the intent of PhraseQueries and slop.
> Slop
> > is the number of intervening tokens that may exist between the words
> > you're looking for. However, all the words you're looking for MUST exist.
> > So,
> >
> > <<< whenever the search phrase contains a word that don't
> > exist in the document, the search result will be empty >>>
> >
> > is exactly how this is intended to work.
> >
> > HTH
> > Erick
> >
> >
> > On Mon, Jun 28, 2010 at 9:09 AM, a peng <zhoudengpeng@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > My test result is that whenever the search phrase contains a word that
> > > don't
> > > exist in the document, the search result will be empty no matter how
> big
> > > the
> > > slop factor I set, seems this is a bug of Lucene, or it is work as
> > design?
> > >
> > > 2010/6/28 tarun sapra <t.sapra97@gmail.com>
> > >
> > > > Hi ,
> > > >
> > > > I think I have been able to understand whats happening here...
> > > >
> > > > Indexed Content : "This is a test".
> > > > your search phrase : "This is a formal test"
> > > > your setting the slop factor 2 , now if your slop factor is 3 it
> should
> > > > work
> > > > because "is" and "a" are stop words thus the words "This" and "test"
> > are
> > > 2
> > > > slop factor apart but in your search phrase "This is a formal test"
> the
> > > > words "This" and "test"  are 3 slop factor thats why it's nor working
> > > > now in search phrase "This is formal test" the words "This" and
> "test"
> > > are
> > > > 2
> > > > slop factor apart thats why this phrase is working.
> > > >
> > > >
> > > >
> > > > On Mon, Jun 28, 2010 at 11:37 AM, a peng <zhoudengpeng@gmail.com>
> > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am using StandardAnalyzer(Version.LUCENE_30);
> > > > >
> > > > > 2010/6/27 tarun sapra <t.sapra97@gmail.com>
> > > > >
> > > > > > which analyzer are you usin'?
> > > > > >
> > > > > >
> > > > > > On Sun, Jun 27, 2010 at 7:12 AM, a peng <zhoudengpeng@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I know the indexed content contains the following text:
"This
> is
> > a
> > > > > test".
> > > > > > > And the search phrase I used is "This is a formal test",
and
> then
> > I
> > > > set
> > > > > > the
> > > > > > > slop of the PhraseQuery as 2 with setSlop(2), but I found
that
> I
> > > can
> > > > > not
> > > > > > > get
> > > > > > > a search result. If I set the search phrase as "This is
formal
> > > test",
> > > > > > then
> > > > > > > I
> > > > > > > can get the search result.
> > > > > > >
> > > > > > > So what is the problem here, thanks in advance.
> > > > > > >
> > > > > > >
> > > > > > > Attached is the Java doc for the setSlop method:
> > > > > > >
> > > > > > > public void *setSlop*(int s)
> > > > > > >
> > > > > > > Sets the number of other words permitted between words
in query
> > > > phrase.
> > > > > > If
> > > > > > > zero, then this is an exact phrase search. For larger values
> this
> > > > works
> > > > > > > like
> > > > > > > a WITHIN or NEAR operator.
> > > > > > >
> > > > > > > The slop is in fact an edit-distance, where the units
> correspond
> > to
> > > > > moves
> > > > > > > of
> > > > > > > terms in the query phrase out of position. For example,
to
> switch
> > > the
> > > > > > order
> > > > > > > of two words requires two moves (the first move places
the
> words
> > > atop
> > > > > one
> > > > > > > another), so to permit re-orderings of phrases, the slop
must
> be
> > at
> > > > > least
> > > > > > > two.
> > > > > > >
> > > > > > > More exact matches are scored higher than sloppier matches,
> thus
> > > > search
> > > > > > > results are sorted by exactness.
> > > > > > >
> > > > > > > The slop is zero by default, requiring exact matches.
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Thanks & Regards
> > > > > > Tarun Sapra
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks & Regards
> > > > Tarun Sapra
> > > >
> > >
> >
>
>
>
> --
> Thanks & Regards
> Tarun Sapra
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message