lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joseph Turian <tur...@gmail.com>
Subject Re: MemoryIndex or RAMDirectory, but score using term statistics from a corpus given during preprocessing?
Date Mon, 01 Nov 2010 20:40:51 GMT
Does this question make sense?

What I want is to compute term statistics over a corpus, and then use these
statistics when doing scoring + retrieval using a MemoryIndex or
RAMDirectory.

How can I do that?

Thanks,
    Joseph

On Thu, Oct 28, 2010 at 8:06 PM, Joseph Turian <turian@gmail.com> wrote:

> How do I use MemoryIndex or RAMDirectory, but score using term statistics
> from a corpus given during preprocessing?
>
> Let's say I want to use a MemoryIndex or RAMDirectory to store a *single*
> document, and then run a query against it, and get the score of the query
> using just this one document.
> I know how to do this. See, for some example code, this blog post on
> persistent search:
> http://www.sajalkayan.com/prospective-search-using-python.html
>
> Now what I want to do is take a *corpus* of K documents, and "index" it
> during preprocessing to calculate the term statistics (e.g. idf).
> I then want to freeze these term statistics, and use them whenever I insert
> and compute the query score of a new document.
> I.e. I want to *quickly* query a new document, using preprocessed term
> statistics in the scoring function.
> How can I do this?
>
> Thanks,
>    Joseph
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message