lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Barbara Krausz <>
Subject Determining the IDF while searching for documents
Date Mon, 13 Jun 2005 19:30:21 GMT
Hi all,

is it possible to determine the IDF (the documents in which a term 
appears) while searching for documents? I implemented an index based on 
trigrams, i.e. the indexterms are now Strings of 3 characters so that my 
search engine finds documents with OCR-Errors. When I'm searching for 
the term "rainstorm" for example I split it up into the trigrams __r, 
_ra, rai, ain, ins...
First I look for documents which contain at least 8 of the 11 trigrams 
of "rainstorm" (the misspelled "ranstorm" contains 8 of the 11 
trigrams), then I check if the trigrams form a term like "rainstorm". In 
order to compute the TF I count the occurences of terms which are 
similar to the term. But I've got problems to compute the IDF, because I 
must know the number of documents in which the term appears before 
searching for the documents (in the method sumOfSquaredWeights() in my 
weight). I used hsqldb during indexing and saved the number of documents 
for each term. But it's really slow.
My question is the following: When I'm searching for documents which 
contain terms similar to the searchterm I actually get the number of 
documents that contain the term. But I need the IDF before searching 
these documents for example for BooleanQueries which need the IDF to 
normalize the queryvector. Can I solve this problem, i.e. can I 
determine the IDF later and normalize the BooleanQuery?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message