lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Soeren Pekrul <>
Subject Re: Incremental Index and Comparing different Scores from different Index
Date Mon, 04 Dec 2006 13:58:41 GMT
Hello Nils,

how about having one index for all documents with two fields "date" and 
"content"? You can search documents for a specific date and the score 
uses the global idf of all documents.


Nils Höller schrieb:
> I thought of making the idf function a NOOP, since this is somehow one
> of the Ways Y. Yang wrote about in "A System For New Event Detection".
> The final idf estimate can be found by using a training set, which i
> can use.
> But this is not the best way, as you said, due to the score quality.
> The second way is: Doing an incremental idf, by updating the index
> everytime a new (day) archive has arrived. Then I use the whole Index
> for getting the idf. But for search results I WANT ONLY results of the
> current day to arrive.
> As I said, this all deals with Continous Queries and Online
> classification.
> Situation: I ve to find the best document in a period of 60 days.
> I m using optimal stopping theory, but this doesnt matter here.
> I want to query for  documents for a special day, but have the idf vor
> all days.
> An Example: 
> I ve got an Index of day 1. The idf can be calculated based on it.
> My Query System doesn't want to mark a document as relevant.
> So now all documents are marked unrelevant and can't be taken as
> relevant ones anymore.
> Now on the second day ive got a new version of my web archive.
> I will "merge" Index of day 1 with the new index/documents to get a
> global idf (called incremental idf).
> For my classification I'm only allowed to classify documents of day2.
> Now how can I query the index (the query is everyday the same) and get
> only documents of day 2, but using the score function (with idf and so
> number of document = all documents day 1 + 2).
> I hope you understand my problem.
> Is it possible to search a lucene index and get only documents of a
> certain day (or the last added documents) but using a global idf the
> global index.
> This would be also the solution of my problem.
> Thanks Nils

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message