lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: PhraseQuery issues - differences with SpanNearQuery
Date Thu, 04 Sep 2008 18:39:13 GMT
Sounds like its more in line with what you are looking for. If I 
remember correctly, the phrase query factors in the edit distance in 
scoring, but the NearSpanQuery will just use the combined idf for each 
of the terms in it, so distance shouldnt matter with spans (I'm sure 
Paul will correct me if I am wrong).

- Mark

Yannis Pavlidis wrote:
> Hi,
>
> I am having an issue when using the PhraseQuery which is best illustrated with this example:
>
> I have created 2 documents to emulate URLs. One with a URL of: "http://www.airballoon.com"
and title "air balloon" and the second one with URL
> "http://www.balloonair.com" and title: "balloon air".
>
> Test1 (PhraseQuery)
> ======
> Now when I use the phrase query with - title: "air balloon" ~2
> I get back:
>
> url: "http://www.airballoon.com" - score: 1.0
> url: "http://www.balloonair.com" - score: 0.57
>
> Test2 (PhraseQuery)
> ======
> Now when I use the phrase query with - title: "balloon air" ~2
> I get back:
> url: "http://www.balloonair.com" - score: 1.0
> url: "http://www.airballoon.com" - score: 0.57
>
> Test3 (PhraseQuery)
> ======
> Now when I use the phrase query with - title: "air balloon" ~2 title: "balloon air" ~2
> I get back:
> url: "http://www.airballoon.com" - score: 1.0
> url: "http://www.balloonair.com" - score: 1.0
>
> Test4 (SpanNearQuery)
> =======
> spanNear([title:air, title:balloon], 2, false)
> I get back:
> url: "http://www.airballoon.com" - score: 1.0
> url: "http://www.balloonair.com" - score: 1.0
>
> I would have expected that Test1, Test2 would actually return both URLs with score of
1.0 since I am setting the slop to 2. It seems though that lucene really favors and absolute
exact match.
>
> Is it safe to assume that for what I am looking for (basically score the docs the same
regardless on when someone is searching for "air balloon" or "balloon air") it would be better
to use the SpanNearQuery rather than the PhraseQuery?
>
> Any input would be appreciated. 
>
> Thanks in advance,
>
> Yannis.
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message