lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Uwe Schindler <...@thetaphi.de>
Subject Re: get frequency of each term from a document
Date Sun, 20 Sep 2015 15:41:38 GMT
Hi,

For term vectors enum the doc freq is always 1 and the term freq is the one from the document
you got term vectors.

Term vectors just implement the same interface, but they can be seen as a small index per
document. This is made like that to allow executing queries for highlighting on single document.

Uwe

Am 20. September 2015 16:28:12 MESZ, schrieb Ziqi Zhang <ziqi.zhang@sheffield.ac.uk>:
>Thanks but TermsEnum has two methods that returns frequency-related 
>info, both are corpus-level, not document specific:
>
>-docFreq() Returns the number of documents containing the current term.
>-totalTermFreq() Returns the total number of occurrences of this term 
>across all documents (the sum of the freq() for each doc that has this 
>term).
>
>However I will need document specific frequency, i.e., freq of term A
>in 
>Doc 1, 2, ... N
>
>Thanks
>
>On 20/09/2015 15:07, Uwe Schindler wrote:
>> Hi,
>>
>> With the terms enum you can iterate over all terms. Each one returns
>its term frequency. Of course, you need to enable term vectors during
>indexing. The pattern how to use terms enum can be looked up at various
>places in Lucene source code. It's a very expert API but it is the way
>to go here.
>>
>> Uwe
>>
>> Am 20. September 2015 15:35:40 MESZ, schrieb Ziqi Zhang
><ziqi.zhang@sheffield.ac.uk>:
>>> Hi
>>>
>>> Is it possible to get a list of terms within a document, and also TF
>of
>>>
>>> each of these terms *in that document only*? (Lucene 5.3)
>>>
>>> IndexReader has a method "Terms getTermVector(int docID, String
>>> field)",
>>> which gives me a "Terms" object, on which I can get a TermsEnum. But
>I
>>> do not know where to go then.
>>>
>>> thanks
>> --
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, 28213 Bremen
>> http://www.thetaphi.de
>
>
>-- 
>Ziqi Zhang
>Research Associate
>Department of Computer Science
>University of Sheffield
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>For additional commands, e-mail: java-user-help@lucene.apache.org

--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de
Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message