lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: Indexing Term Frequency Vectors
Date Fri, 29 Mar 2013 01:06:16 GMT
Hi,

On Thu, Mar 28, 2013 at 8:25 PM, Sharon Tam <sharontam@gmail.com> wrote:
> I believe that when Lucene indexes documents, it generates counts for a
> term by counting how many times the term appears in a particular document.
> Instead of having Lucene do the counting, I want to do my own counting and
> feed a term-frequency vector representation of a document directly into the
> indexer which will take my counts and proceed to do the other processing
> such as generating inverse document frequency.  My term-frequencies may not
> all be integers.  Is there a way to do this?

You could provide the indexer with arbitrary frequencies by creating a
handcrafted TokenStream that repeats terms ${termFreq} times, but
unfortunately, frequencies need to be strictly positive (> 0)
integers.

--
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message