lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felipe Hummel <felipehum...@gmail.com>
Subject Re: Search one index but use IDF from another?
Date Mon, 14 Mar 2011 02:01:43 GMT
In stackoverflow somebody answer me this:

You should be able to extend IndexReader and override the docFreq() methods
to provide whatever values you'd like. One thing this implementation can do
is open two IndexReader instances -- one for the small index and one for the
large index. All the methods are delegated to the small IndexReader, except
for docFreq(), which is delegated to the large index. You'll need to scale
the value returned, i.e.
int myNewDocFreq = (bigIndexReader.docFreq(t) /
bigIndexReader.maxDoc()) *smallIndexReader
.maxDoc()


It seems all right to me. Is that correct?

On Thu, Mar 10, 2011 at 4:36 PM, Andrzej Bialecki <ab@getopt.org> wrote:

> On 3/10/11 8:32 PM, Felipe Hummel wrote:
>
>> Hi, I'm building a system where I want to show only results indexed in the
>> past few days. Furthermore, I don't want to maintain a giant index with
>> millions of documents if I only want to return results from a couple of
>> days
>> (thousands of documents).
>>
>> My system heavily relies that the occurrences of terms in documents stored
>> in the index have a realistic distribution (consequently: realistic IDF).
>>
>> That said, I would like to use a small index to return results, but I want
>> to compute documents score using a IDF from a much greater Index (or even
>> an
>> external source).
>>
>> The Similarity API doesn't seem to allow me to do this. The *idf* method
>> does not receive as parameter the term being used.
>>
>> Another possibility is to use TrieRangeQuery to make sure the documents
>> shown are within the last couple of days. Again, I rather not mantain a
>> large index. Also this kind of query is not cheap.
>>
>> Am I missing something?
>>
>
> Take a look at SOLR-1632. Indeed, it's not possible to do this using
> Similarity alone. You will need something like the DFSource class in that
> patch, i.e. a subclass of IndexSearcher, where you populate term->DF map
> with values obtained from the full index, and then you use this map to
> calculate IDF.
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message