lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <>
Subject Re: Multiple PositionIncrement attributes
Date Thu, 25 Apr 2013 11:25:23 GMT
You can use SpanNearQuery to seek matches within a specified distance.

Lucene knows nothing about "sentences". But if you have an analyzer or 
custom code that artificially bumps the position to the next multiple of 
some number like 100 or 1000 when a sentence boundary pattern is 
encountered, you could use that number times n to match within n sentences, 
roughly, plus or minus a sentence or two - there is nothing to cause the 
nearness to be rounded or truncated exactly to one of those boundaries.

Maybe you want two numbers: 1) sentence separation, say 1000, and 2) maximum 
sentence length, say 500. The SpanNearQuery would use n-1 times the sentence 
separation plus the maximum sentence length. Well, you have to adjust that 
for how you count sentences - is 1 the current sentence or is that 0?

-- Jack Krupansky

-----Original Message----- 
From: Igor Shalyminov
Sent: Thursday, April 25, 2013 6:54 AM
Subject: Multiple PositionIncrement attributes

Hi all!

I use PositionIncrement attribute for finding words at some distance from 
each other. And I have two problems with that:
1) I want to search words within one sentence. A possible solution would be 
to set PositionIncrement of +INF (like 30 :) ) to the sentence break tag.
2) I want to use in my search both word-distance and sentence-distance 
between words (e.g. find the word "Putin" within 3 sentences after the word 
"Obama" or find the words "cheese" and "bacon" in one sentence within 3 
words of each other).

For the 2nd problem, is there a way of storing multiple position information 
sources in the index and using them for searching? Say, at least choosing 
one of those for a query.

Best Regards,
Igor Shalyminov

To unsubscribe, e-mail:
For additional commands, e-mail: 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message