lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: Proximity in ranking, summary generation
Date Wed, 01 Dec 2004 18:34:09 GMT

On Dec 1, 2004, at 11:54 AM, Venkatraju wrote:
> This is actually 2 somewhat related questions:
> - In regular multi term queries, does the default ranking function of
> Lucene take into account proximity of the search terms? As far as I
> know, proximity data is used only in phrase searches. Is this correct?


> If so, does someone have pointers/sample implementation of how
> proximity data can be used to supplement tfidf in ranking documents?

Have a look at PhraseQuery and how it does its ranking.

> - Is there a way to get the term offset or byte offset of the best
> match(es) in the document? I am looking to use this information for
> summary generation/highlighting.

Term offset (aka position increments) is stored in the index.  However, 
for highlighting you need byte offsets.  The CVS version of Lucene 
includes an option to capture the byte offsets in the index also.  
Prior to this CVS version, it required re-analyzing the text to be 
highlighted.  Token objects contain the byte offsets that the 
Highlighter package uses.

Have a look at the Highlighter code in the Lucene Sandbox.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message