lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From a peng <zhoudengp...@gmail.com>
Subject Re: A question regarding the setSlop method of class PhraseQuery (Lucene version 3.0.1)
Date Wed, 30 Jun 2010 00:30:09 GMT
Hi Erick,

Any comments about this requirement?

2010/6/29 a peng <zhoudengpeng@gmail.com>

> Hi Erick,
>
> Thanks for you reply, now I get the point why I can not get the search
> result. But can you guide me how can I use Lucene to implement the following
> search feature:
> Basically we can call this feature "fuzzy phrase search", which means the
> search phrase may contains more words or less words compared to the
> sentences indexed in the document. Again I want to use the sample I posted
> to demo it:
>
> The document contains a sentence "This is a test", and the search phrase is
> "This is a formal test", as there is only one word difference between the
> two sentences, how can I get the "This is a test" phrase as the search
> result?
>
> Thanks.
>
>
>
> 2010/6/29 Erick Erickson <erickerickson@gmail.com>
>
> No, I don't think so. The critical bit is that the indexed text
>> does NOT contain the word "formal".  So searching for
>> any phrase that DOES contain "formal" should fail no matter
>> what the slop.
>>
>> Phrase queries are something like "find all the words in this
>> search string, ignoring some number of intervening tokens not in the
>> search string". There's nothing in there about "find only some of the
>> words in the search string"....
>>
>> I'm guessing that the original post had a typo in the success case,
>> because it's contradicted by "a peng's" second post. It's always
>> possible that I'm experiencing a brain short......
>>
>> Best
>> Erick
>>
>> On Mon, Jun 28, 2010 at 11:32 AM, tarun sapra <t.sapra97@gmail.com>
>> wrote:
>>
>> > Hey Erick
>> >
>> > Thanks mate!
>> >
>> > So I guess my explanation in the mail chain above was correct!
>> >
>> > On Mon, Jun 28, 2010 at 6:20 AM, Erick Erickson <
>> erickerickson@gmail.com
>> > >wrote:
>> >
>> > > I think you're misunderstanding the intent of PhraseQueries and slop.
>> > Slop
>> > > is the number of intervening tokens that may exist between the words
>> > > you're looking for. However, all the words you're looking for MUST
>> exist.
>> > > So,
>> > >
>> > > <<< whenever the search phrase contains a word that don't
>> > > exist in the document, the search result will be empty >>>
>> > >
>> > > is exactly how this is intended to work.
>> > >
>> > > HTH
>> > > Erick
>> > >
>> > >
>> > > On Mon, Jun 28, 2010 at 9:09 AM, a peng <zhoudengpeng@gmail.com>
>> wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > My test result is that whenever the search phrase contains a word
>> that
>> > > > don't
>> > > > exist in the document, the search result will be empty no matter how
>> > big
>> > > > the
>> > > > slop factor I set, seems this is a bug of Lucene, or it is work as
>> > > design?
>> > > >
>> > > > 2010/6/28 tarun sapra <t.sapra97@gmail.com>
>> > > >
>> > > > > Hi ,
>> > > > >
>> > > > > I think I have been able to understand whats happening here...
>> > > > >
>> > > > > Indexed Content : "This is a test".
>> > > > > your search phrase : "This is a formal test"
>> > > > > your setting the slop factor 2 , now if your slop factor is 3
it
>> > should
>> > > > > work
>> > > > > because "is" and "a" are stop words thus the words "This" and
>> "test"
>> > > are
>> > > > 2
>> > > > > slop factor apart but in your search phrase "This is a formal
>> test"
>> > the
>> > > > > words "This" and "test"  are 3 slop factor thats why it's nor
>> working
>> > > > > now in search phrase "This is formal test" the words "This" and
>> > "test"
>> > > > are
>> > > > > 2
>> > > > > slop factor apart thats why this phrase is working.
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Mon, Jun 28, 2010 at 11:37 AM, a peng <zhoudengpeng@gmail.com>
>> > > wrote:
>> > > > >
>> > > > > > Hi,
>> > > > > >
>> > > > > > I am using StandardAnalyzer(Version.LUCENE_30);
>> > > > > >
>> > > > > > 2010/6/27 tarun sapra <t.sapra97@gmail.com>
>> > > > > >
>> > > > > > > which analyzer are you usin'?
>> > > > > > >
>> > > > > > >
>> > > > > > > On Sun, Jun 27, 2010 at 7:12 AM, a peng <
>> zhoudengpeng@gmail.com>
>> > > > > wrote:
>> > > > > > >
>> > > > > > > > Hi,
>> > > > > > > >
>> > > > > > > > I know the indexed content contains the following
text:
>> "This
>> > is
>> > > a
>> > > > > > test".
>> > > > > > > > And the search phrase I used is "This is a formal
test", and
>> > then
>> > > I
>> > > > > set
>> > > > > > > the
>> > > > > > > > slop of the PhraseQuery as 2 with setSlop(2),
but I found
>> that
>> > I
>> > > > can
>> > > > > > not
>> > > > > > > > get
>> > > > > > > > a search result. If I set the search phrase as
"This is
>> formal
>> > > > test",
>> > > > > > > then
>> > > > > > > > I
>> > > > > > > > can get the search result.
>> > > > > > > >
>> > > > > > > > So what is the problem here, thanks in advance.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > Attached is the Java doc for the setSlop method:
>> > > > > > > >
>> > > > > > > > public void *setSlop*(int s)
>> > > > > > > >
>> > > > > > > > Sets the number of other words permitted between
words in
>> query
>> > > > > phrase.
>> > > > > > > If
>> > > > > > > > zero, then this is an exact phrase search. For
larger values
>> > this
>> > > > > works
>> > > > > > > > like
>> > > > > > > > a WITHIN or NEAR operator.
>> > > > > > > >
>> > > > > > > > The slop is in fact an edit-distance, where the
units
>> > correspond
>> > > to
>> > > > > > moves
>> > > > > > > > of
>> > > > > > > > terms in the query phrase out of position. For
example, to
>> > switch
>> > > > the
>> > > > > > > order
>> > > > > > > > of two words requires two moves (the first move
places the
>> > words
>> > > > atop
>> > > > > > one
>> > > > > > > > another), so to permit re-orderings of phrases,
the slop
>> must
>> > be
>> > > at
>> > > > > > least
>> > > > > > > > two.
>> > > > > > > >
>> > > > > > > > More exact matches are scored higher than sloppier
matches,
>> > thus
>> > > > > search
>> > > > > > > > results are sorted by exactness.
>> > > > > > > >
>> > > > > > > > The slop is zero by default, requiring exact matches.
>> > > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > --
>> > > > > > > Thanks & Regards
>> > > > > > > Tarun Sapra
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Thanks & Regards
>> > > > > Tarun Sapra
>> > > > >
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Thanks & Regards
>> > Tarun Sapra
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message