lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: Dmitry's Term Vector stuff, plus some
Date Wed, 25 Feb 2004 00:00:24 GMT
I'm not sure what applications people have in mind for Term Vector support  but I would prefer
to have the original text positions (not term sequence positions) stored so I can offer this:
1) Significant terms/phrases identification
Like "Gigabits" on - used to offer choices of (unstemmed) "significant" terms
and phrases for query expansion to the end user.
2) Optimised Highlighting
No more re-tokenizing of text to find unstemmed  forms.

The current "more like this query " can be optimised if it uses TermVectors too  - it simply
takes a document ID and obtains a list of  significant terms without the need to re-tokenize
(it doesn't need to know any positions - just term frequencies)

Am I missing something or are there other applications where term sequence position is more
useful than term text position?

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message