lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From markharw...@yahoo.co.uk
Subject Re: Dmitry's Term Vector stuff, plus some
Date Wed, 25 Feb 2004 00:00:24 GMT
I'm not sure what applications people have in mind for Term Vector support  but I would prefer
to have the original text positions (not term sequence positions) stored so I can offer this:
1) Significant terms/phrases identification
Like "Gigabits" on gigablast.com - used to offer choices of (unstemmed) "significant" terms
and phrases for query expansion to the end user.
2) Optimised Highlighting
No more re-tokenizing of text to find unstemmed  forms.

The current "more like this query " can be optimised if it uses TermVectors too  - it simply
takes a document ID and obtains a list of  significant terms without the need to re-tokenize
(it doesn't need to know any positions - just term frequencies)

Am I missing something or are there other applications where term sequence position is more
useful than term text position?

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message