lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Search one index but use IDF from another?
Date Thu, 10 Mar 2011 20:36:30 GMT
On 3/10/11 8:32 PM, Felipe Hummel wrote:
> Hi, I'm building a system where I want to show only results indexed in the
> past few days. Furthermore, I don't want to maintain a giant index with
> millions of documents if I only want to return results from a couple of days
> (thousands of documents).
>
> My system heavily relies that the occurrences of terms in documents stored
> in the index have a realistic distribution (consequently: realistic IDF).
>
> That said, I would like to use a small index to return results, but I want
> to compute documents score using a IDF from a much greater Index (or even an
> external source).
>
> The Similarity API doesn't seem to allow me to do this. The *idf* method
> does not receive as parameter the term being used.
>
> Another possibility is to use TrieRangeQuery to make sure the documents
> shown are within the last couple of days. Again, I rather not mantain a
> large index. Also this kind of query is not cheap.
>
> Am I missing something?

Take a look at SOLR-1632. Indeed, it's not possible to do this using 
Similarity alone. You will need something like the DFSource class in 
that patch, i.e. a subclass of IndexSearcher, where you populate 
term->DF map with values obtained from the full index, and then you use 
this map to calculate IDF.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message