Ahh I see.
Term vectors are actually an inverted index for a single document, and they
also have the same postings API as the whole index (including
TermsEnum.totalTermFreq), but that method likely always returns 1 for term
vectors because it's not implemented? Maybe Lucene's default codec should
be improved to store this; maybe open an issue?
In the meantime you could make your own codec that does store it.
Mike McCandless
http://blog.mikemccandless.com
On Tue, Apr 18, 2017 at 9:12 AM, Manjula Wijewickrema <manjula53@gmail.com>
wrote:
> Hi Mike,
>
> Thanks for the answer. I think this returns the total number of
> occurrences of a specified term across all the documents in the corpus
> right?
>
> But I need the total number of terms (including multiple occurrences of
> the same term) in each document of the corpus. Any suggestion?
>
> Thanks!
>
> On Tue, Apr 18, 2017 at 2:53 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> I think you want to use the TermsEnum.totalTermFreq method?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Sun, Apr 16, 2017 at 11:36 AM, Manjula Wijewickrema <
>> manjula53@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Is there any way to get the total count of terms in the Term Frequency
>>> Vector (tvf)? I need to calculate the Normalized term frequency of each
>>> term in my tvf. I know how to obtain the length of the tvf, but it
>>> doesn't
>>> work since I need to count duplicate occurrences as well.
>>>
>>> Highly appreciate your kind response.
>>>
>>
>>
>
