lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Proximity Searches behavior
Date Wed, 09 Jun 2004 11:34:03 GMT
On Jun 9, 2004, at 4:26 AM, gaudinat wrote:
> What does exactly happen with three words or more when we do a 
> proximity search?
> such as:  "lucene jakarta best"~10
> Is each word can be at a distance of  10 of each others, or is there 
> an other behaviour?

The total number of "hops" to put the words in order is calculated and 
if it is less than or equal to 10 there is a match.

For example, a field was indexed with "the quick brown fox jumps...".

"quick brown" matches
"quick fox"~1 matches (# of hops = 1)
"quick jumps"~1 does not match (# hops = 2)
"the brown jumps"~1 does not match (# hops = 2)
"fox brown"~1 does NOT match (# hops is actually 2)

> By the way, I would like to know  if someone use this lucene feature 
> regularly?

I suspect many people do.  You can also set a default slop factor on 
QueryParser so users don't have to use the ~10 syntax.

> For my part I would like to use this feature to improve the precision 
> of the finding documents by using the word position.

Also have a look at the new (in Lucene 1.4) SpanQuery family.  It has a 
less confusing "slop factor" and you can control whether the terms must 
be in order or not (with SpanNearQuery).  QueryParser does not support 
it currently, but subclassing and overriding getFieldQuery makes it 
possible.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message