lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Use case for term vector's token position/offset?
Date Tue, 21 Nov 2006 16:13:00 GMT
Hi Jong,

I think these are useful for things like highlighting (I think  
contrib/highlighter can use them); other post processing algorithms  
such as: question answering, calculating co-occurrences (find the 6  
terms to the left and right of the term at position 16).  Perhaps you  
want to give higher scores to documents where your terms occur in a  
certain part of the document (like the beginning)

Really, any application where you need to know the relationships  
between the terms in a document or the document and the original.


On Nov 21, 2006, at 10:36 AM, Jong Kim wrote:

> Hi,
> When I look at org.apache.lucene.document.Field.TermVector,
> it defines the following 5 options as to the detailed info
> that can be stored wrt term vectors.
> 1. NO
> 5. YES
> It isn't difficult to understand where the basic term vector
> information (ie, terms and their number of occurences - option 5)
> might be useful. I believe it can be used to implement features
> like "concept search" or "more like this" functionalities.
> However, it isn't clear to me how the other extra info (ie,
> token position information and/or token offset information)
> might be used? Can anyone help me understand what kind of
> (advanced) search techniques people use these extra
> information for, or even better, any pointer to real world
> examples?
> Thanks
> /Jong
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
335 Hinds Hall
Syracuse, NY 13244

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message