lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: A question regarding the setSlop method of class PhraseQuery (Lucene version 3.0.1)
Date Wed, 30 Jun 2010 01:08:32 GMT
No, I really don't have a good solution off the top of my head, perhaps
others can chime in...

Although I suppose you could fire multiple queries dropping out
some number of search terms, but I don't know whether that
satisfies your requirements. E.g. search the following phrases
"this is a formal test"
"this is a formal"
"this is a test"
"this is formal test"
"this a formal test"
"is a formal test"

and so on...

Best
Erick

On Tue, Jun 29, 2010 at 8:30 PM, a peng <zhoudengpeng@gmail.com> wrote:

> Hi Erick,
>
> Any comments about this requirement?
>
> 2010/6/29 a peng <zhoudengpeng@gmail.com>
>
> > Hi Erick,
> >
> > Thanks for you reply, now I get the point why I can not get the search
> > result. But can you guide me how can I use Lucene to implement the
> following
> > search feature:
> > Basically we can call this feature "fuzzy phrase search", which means the
> > search phrase may contains more words or less words compared to the
> > sentences indexed in the document. Again I want to use the sample I
> posted
> > to demo it:
> >
> > The document contains a sentence "This is a test", and the search phrase
> is
> > "This is a formal test", as there is only one word difference between the
> > two sentences, how can I get the "This is a test" phrase as the search
> > result?
> >
> > Thanks.
> >
> >
> >
> > 2010/6/29 Erick Erickson <erickerickson@gmail.com>
> >
> > No, I don't think so. The critical bit is that the indexed text
> >> does NOT contain the word "formal".  So searching for
> >> any phrase that DOES contain "formal" should fail no matter
> >> what the slop.
> >>
> >> Phrase queries are something like "find all the words in this
> >> search string, ignoring some number of intervening tokens not in the
> >> search string". There's nothing in there about "find only some of the
> >> words in the search string"....
> >>
> >> I'm guessing that the original post had a typo in the success case,
> >> because it's contradicted by "a peng's" second post. It's always
> >> possible that I'm experiencing a brain short......
> >>
> >> Best
> >> Erick
> >>
> >> On Mon, Jun 28, 2010 at 11:32 AM, tarun sapra <t.sapra97@gmail.com>
> >> wrote:
> >>
> >> > Hey Erick
> >> >
> >> > Thanks mate!
> >> >
> >> > So I guess my explanation in the mail chain above was correct!
> >> >
> >> > On Mon, Jun 28, 2010 at 6:20 AM, Erick Erickson <
> >> erickerickson@gmail.com
> >> > >wrote:
> >> >
> >> > > I think you're misunderstanding the intent of PhraseQueries and
> slop.
> >> > Slop
> >> > > is the number of intervening tokens that may exist between the words
> >> > > you're looking for. However, all the words you're looking for MUST
> >> exist.
> >> > > So,
> >> > >
> >> > > <<< whenever the search phrase contains a word that don't
> >> > > exist in the document, the search result will be empty >>>
> >> > >
> >> > > is exactly how this is intended to work.
> >> > >
> >> > > HTH
> >> > > Erick
> >> > >
> >> > >
> >> > > On Mon, Jun 28, 2010 at 9:09 AM, a peng <zhoudengpeng@gmail.com>
> >> wrote:
> >> > >
> >> > > > Hi,
> >> > > >
> >> > > > My test result is that whenever the search phrase contains a
word
> >> that
> >> > > > don't
> >> > > > exist in the document, the search result will be empty no matter
> how
> >> > big
> >> > > > the
> >> > > > slop factor I set, seems this is a bug of Lucene, or it is work
as
> >> > > design?
> >> > > >
> >> > > > 2010/6/28 tarun sapra <t.sapra97@gmail.com>
> >> > > >
> >> > > > > Hi ,
> >> > > > >
> >> > > > > I think I have been able to understand whats happening here...
> >> > > > >
> >> > > > > Indexed Content : "This is a test".
> >> > > > > your search phrase : "This is a formal test"
> >> > > > > your setting the slop factor 2 , now if your slop factor
is 3 it
> >> > should
> >> > > > > work
> >> > > > > because "is" and "a" are stop words thus the words "This"
and
> >> "test"
> >> > > are
> >> > > > 2
> >> > > > > slop factor apart but in your search phrase "This is a formal
> >> test"
> >> > the
> >> > > > > words "This" and "test"  are 3 slop factor thats why it's
nor
> >> working
> >> > > > > now in search phrase "This is formal test" the words "This"
and
> >> > "test"
> >> > > > are
> >> > > > > 2
> >> > > > > slop factor apart thats why this phrase is working.
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On Mon, Jun 28, 2010 at 11:37 AM, a peng <
> zhoudengpeng@gmail.com>
> >> > > wrote:
> >> > > > >
> >> > > > > > Hi,
> >> > > > > >
> >> > > > > > I am using StandardAnalyzer(Version.LUCENE_30);
> >> > > > > >
> >> > > > > > 2010/6/27 tarun sapra <t.sapra97@gmail.com>
> >> > > > > >
> >> > > > > > > which analyzer are you usin'?
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > On Sun, Jun 27, 2010 at 7:12 AM, a peng <
> >> zhoudengpeng@gmail.com>
> >> > > > > wrote:
> >> > > > > > >
> >> > > > > > > > Hi,
> >> > > > > > > >
> >> > > > > > > > I know the indexed content contains the following
text:
> >> "This
> >> > is
> >> > > a
> >> > > > > > test".
> >> > > > > > > > And the search phrase I used is "This is
a formal test",
> and
> >> > then
> >> > > I
> >> > > > > set
> >> > > > > > > the
> >> > > > > > > > slop of the PhraseQuery as 2 with setSlop(2),
but I found
> >> that
> >> > I
> >> > > > can
> >> > > > > > not
> >> > > > > > > > get
> >> > > > > > > > a search result. If I set the search phrase
as "This is
> >> formal
> >> > > > test",
> >> > > > > > > then
> >> > > > > > > > I
> >> > > > > > > > can get the search result.
> >> > > > > > > >
> >> > > > > > > > So what is the problem here, thanks in advance.
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > Attached is the Java doc for the setSlop
method:
> >> > > > > > > >
> >> > > > > > > > public void *setSlop*(int s)
> >> > > > > > > >
> >> > > > > > > > Sets the number of other words permitted
between words in
> >> query
> >> > > > > phrase.
> >> > > > > > > If
> >> > > > > > > > zero, then this is an exact phrase search.
For larger
> values
> >> > this
> >> > > > > works
> >> > > > > > > > like
> >> > > > > > > > a WITHIN or NEAR operator.
> >> > > > > > > >
> >> > > > > > > > The slop is in fact an edit-distance, where
the units
> >> > correspond
> >> > > to
> >> > > > > > moves
> >> > > > > > > > of
> >> > > > > > > > terms in the query phrase out of position.
For example, to
> >> > switch
> >> > > > the
> >> > > > > > > order
> >> > > > > > > > of two words requires two moves (the first
move places the
> >> > words
> >> > > > atop
> >> > > > > > one
> >> > > > > > > > another), so to permit re-orderings of phrases,
the slop
> >> must
> >> > be
> >> > > at
> >> > > > > > least
> >> > > > > > > > two.
> >> > > > > > > >
> >> > > > > > > > More exact matches are scored higher than
sloppier
> matches,
> >> > thus
> >> > > > > search
> >> > > > > > > > results are sorted by exactness.
> >> > > > > > > >
> >> > > > > > > > The slop is zero by default, requiring exact
matches.
> >> > > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > --
> >> > > > > > > Thanks & Regards
> >> > > > > > > Tarun Sapra
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > --
> >> > > > > Thanks & Regards
> >> > > > > Tarun Sapra
> >> > > > >
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Thanks & Regards
> >> > Tarun Sapra
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message