lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: About counting term hits
Date Fri, 14 Nov 2008 15:49:50 GMT

I think to do this efficiently you'd need to modify Lucene's builtin  
query classes (eg TermQuery) such that during the scoring process, in  
addition to simply computing its contribution to the document's score,  
it would also record further information like total number of  
occurrences of each term, which docs had which terms, etc.

I don't think there's a simple efficient way to do this with Lucene  
today, though if your result sets are small enough term vectors might  
be fine.

Mike

lbarcala@freeresearch.org wrote:

>> Mario,
>>
>> Does this help:
>> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/index/TermFreqVector.html
>>
>> Plus:
>> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/index/IndexReader.html#method_summary
>> (look for "getTerm.Freq...")
>>
>> Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>
> So, if I undertand the solution, the main steps to do what I propose  
> is:
>
> 1) To obtain the documents which match the query (documents which  
> include
> the word "house")
> 2) To loop throw matching documents to access the IndexReader for
> obtaining their term frequencies.
> 3) To obtain from TermFreqVector the frequencies of the Term  
> ("house") to
> calculate the result.
>
> And, if it is a very frequent query and there are much documents (>  
> 10.000),
> would LUCENE solve it in a reasonable time? A query might match  
> several
> hundred documents.
>
> Thank you,
>
>  Mario Barcala
>
>>>> Hello:
>>>>
>>>> I am new to LUCENE and I am testing some issues about it. I can
>>>> retrieve
>>>> the number of documents which satisfies a query, but I don't find  
>>>> how
>>>> to
>>>> obtain the number of terms which match it.
>>>>
>>>> For example, if I search for the word "house", I want to obtain the
>>>> number of times the word occurs (not the number of documents).
>>>>
>>>> Is it possible to do it in LUCENE?
>>>>
>>>> Thanks in advance,
>>>>
>>>> Mario Barcala
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>
>>>
>>> --
>>> ----------------------------------------
>>> "Help Ever Hurt Never"- Baba
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message