lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Use case for term vector's token position/offset?
Date Tue, 21 Nov 2006 16:13:00 GMT
Hi Jong,

I think these are useful for things like highlighting (I think  
contrib/highlighter can use them); other post processing algorithms  
such as: question answering, calculating co-occurrences (find the 6  
terms to the left and right of the term at position 16).  Perhaps you  
want to give higher scores to documents where your terms occur in a  
certain part of the document (like the beginning)

Really, any application where you need to know the relationships  
between the terms in a document or the document and the original.

HTH,
Grant

On Nov 21, 2006, at 10:36 AM, Jong Kim wrote:

> Hi,
>
> When I look at org.apache.lucene.document.Field.TermVector,
> it defines the following 5 options as to the detailed info
> that can be stored wrt term vectors.
>
> 1. NO
> 2. WITH_OFFSETS
> 3. WITH_POSITIONS
> 4. WITH_POSITIONS_OFFSETS
> 5. YES
>
> It isn't difficult to understand where the basic term vector
> information (ie, terms and their number of occurences - option 5)
> might be useful. I believe it can be used to implement features
> like "concept search" or "more like this" functionalities.
>
> However, it isn't clear to me how the other extra info (ie,
> token position information and/or token offset information)
> might be used? Can anyone help me understand what kind of
> (advanced) search techniques people use these extra
> information for, or even better, any pointer to real world
> examples?
>
> Thanks
> /Jong
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message