lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll" <>
Subject Re: TermFreqVector
Date Wed, 28 Apr 2004 13:00:45 GMT
This is something I had debated when reimplementing Dmitry's change, but decided on the string
representation, as this is how they are stored in the index, so calling getTerms() would have
to construct the terms anyway and some users may not need them as terms.  The original actually
used a integer to represent the term in the index, but could only be used in optimized indexes.
  The strings are stored b/c they are guaranteed to be unique.

In the Term constructor code, the field name is what is being interned, so performance should
actually be (slightly) improved over time given the number of fields is usually small.


>>> 04/25/04 02:55PM >>>
Hi folks,

I started to use the new term vector support. Much more efficient than 
temporarily reindexing documents in a RAMDirectory in order to get their
terms :-)

However, I think it would be more reasonable if the getTerms() method would
return Terms instead of Strings, since this is what at least I need in the
subsequent analysis process. Off course it s easy to construct a term given the
field and the text. However outside the package only the public constructor of 
Term can be called, which does the field.intern(). I don t know how expensive 
the call to intern() really is. Maybe my worries are irrelevant.


To unsubscribe, e-mail: 
For additional commands, e-mail: 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message