lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bruce Ritchie <br...@jivesoftware.com>
Subject Re: Dmitry's Term Vector stuff, plus some
Date Tue, 24 Feb 2004 21:20:14 GMT
Doug Cutting wrote:

> Grant Ingersoll wrote:
> 
>> Do you see any reason to write position information at all for the 
>> term vectors?
> 
> 
> It could be useful to some folks.  If, for example, you only want to 
> expand a query with terms that occur near query terms, like automatic 
> phrase identification.  In general, the vector stuff is just a constant 
> factor improvement over re-tokenizing the text of the document, but 
> hopefully a substantial one.  If folks are doing computations which 
> require positional information, but don't require the actual text (e.g., 
> they don't need user-readable fragments) then positions could be handy.
> 
> But, certainly, most applications for term vectors do not need 
> positions, and I would not be upset if these were left out of the first 
> version.

Forgive me for being thick, however what position information are we talking about here? The
start 
and end position of the token in the source text that the term came from? If so I think it
would be 
useful to have them in at some point as I believe they could be used to optimized the query

highlighting code that Mark Harwood contributed to not have to reanalyze the text every time
one 
wanted to generate a highlighted search summary.


Regards,

Bruce Ritchie
http://www.jivesoftware.com/

Mime
View raw message