lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas D'Silva" <>
Subject Re: Using TermVectorMapper to compute term frequency across documents
Date Thu, 15 Oct 2009 13:57:08 GMT

I have an index with documents that have a text field containing
document text, and a tag field containing tags associated with the
document. I am trying to calculate the probability that a document
contains a particular word and is tagged with a particular tag.
This is related to a MoreLikeThis extension I was trying to write

Most of the time is spent in the loop iterating over the document
tagged with the particular tag, and computing counts of terms across
the documents. If the index contains millions of documents, it takes a
while to compute the document,tag probabilities.


On Wed, Oct 14, 2009 at 8:15 AM, Grant Ingersoll <> wrote:
> On Oct 12, 2009, at 10:46 PM, Thomas D'Silva wrote:
>> Hi,
>> I am trying to compute the counts of terms of the documents returned by
>> running a query using a TermVectorMapper.
>> I was wondering if anyone knew if there was a faster way to do this rather
>> than using a HashMap with a TermVectorMapper to store the counts of the
>> terms and calling getTermFreqVector().
>> I do not require the term frequency within a document.
> I think that is as fast as its going to get unless you have some other
> restrictions that would allow you to use a FieldCache.    Can you describe
> the bigger problem you are trying to solve?
> --------------------------
> Grant Ingersoll
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message