lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tri Cao <tm...@me.com>
Subject Re: Calculate Term Frequency
Date Tue, 19 Aug 2014 16:56:58 GMT
Erick, Solr termfreq implementation also uses DocsEnum with the assumption that freq are called
on ascending
doc IDs which is valid when scoring from from the hit list. If freq is requested for an out
of order doc, a new
DocsEnum has to be created.

Bianca, can you explain your use case in more details? What did you mean by having a new document?
A new
document is added to the index? Then you already have to reopen the searcher/reader anyway
to get a new
DocsEnum.

On Aug 19, 2014, at 08:26 AM, Erick Erickson <erickerickson@gmail.com> wrote:

Hmmm, I'm not at all an expert here, but Solr has a function
query "termfreq" that does what you're doing I think? I wonder
if the code for that function query would be a good place to
copy (or even make use of)? See TermFreqValueSource...

Maybe not helpful at all, but...
Erick

On Tue, Aug 19, 2014 at 7:04 AM, Bianca Pereira <aivykarter@gmail.com        > wrote:
        > Hi everybody,
        >
        > I would like to know your suggestions to calculate Term Frequency in a
        > Lucene document. Currently I am using MultiFields.getTermDocsEnum,
        > iterating through the DocsEnum 'de' returned and getting the frequency with
        > de.freq() for the desired document.
        >
        > My solution gives me the result I want but I am having time issues. For
        > instance, I want to calculate the term frequency for a given term for N
        > documents in a sequence. Then, every time I have a new document I have to
        > retrieve exactly the same DocsEnum again and iterate until find the
        > document I want. Of course I cannot cache DocsEnum (yes, I did this huge
        > mistake) because it is an iterator.
        >
        > Do you have any suggestions on how I can get Term Frequency in a fast way?
        > The unique suggestion I had up to now was "Do it programatically, don't use
        > Lucene". Should be this the solution?
        >
        > Thank you.
        >
        > Regards,
        > Bianca Pereira

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
    • Unnamed multipart/related (inline, None, 0 bytes)
View raw message