lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nils Höller <nu...@nhoeller.de>
Subject Re: Incremental Index and Comparing different Scores from different Index
Date Mon, 04 Dec 2006 11:58:28 GMT
Am Freitag, den 01.12.2006, 11:54 -0800 schrieb Chris Hostetter:
....
> the short answer is you can't, not with the DefaultSimilarity, but you
> might be able to write a custom Similarity that makes the scores
> comparable by making the idf function a NOOP (of course, then your scores
> won't be as "good" -- for some definition of good, but that may not matter
> to you)

I thought of making the idf function a NOOP, since this is somehow one
of the Ways Y. Yang wrote about in "A System For New Event Detection".

The final idf estimate can be found by using a training set, which i
can use.

But this is not the best way, as you said, due to the score quality.

The second way is: Doing an incremental idf, by updating the index
everytime a new (day) archive has arrived. Then I use the whole Index
for getting the idf. But for search results I WANT ONLY results of the
current day to arrive.

As I said, this all deals with Continous Queries and Online
classification.

Situation: I ve to find the best document in a period of 60 days.
I m using optimal stopping theory, but this doesnt matter here.
I want to query for  documents for a special day, but have the idf vor
all days.

An Example: 

I ve got an Index of day 1. The idf can be calculated based on it.
My Query System doesn't want to mark a document as relevant.
So now all documents are marked unrelevant and can't be taken as
relevant ones anymore.

Now on the second day ive got a new version of my web archive.
I will "merge" Index of day 1 with the new index/documents to get a
global idf (called incremental idf).
For my classification I'm only allowed to classify documents of day2.

Now how can I query the index (the query is everyday the same) and get
only documents of day 2, but using the score function (with idf and so
number of document = all documents day 1 + 2).

I hope you understand my problem.

Is it possible to search a lucene index and get only documents of a
certain day (or the last added documents) but using a global idf the
global index.

This would be also the solution of my problem.

Thanks Nils

> 
> this FAQ entry, and in particular the thread from teh second link in it
> have a *lot* of more info on this...
> http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-912c1f237bb00259185353182948e5935f0c2f03
> 
> 
> -Hoss
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message