lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Determining the IDF while searching for documents
Date Tue, 14 Jun 2005 06:48:37 GMT

I'm not 100% sure I understand your question, but...

: order to compute the TF I count the occurences of terms which are
: similar to the term. But I've got problems to compute the IDF, because I
: must know the number of documents in which the term appears before
: searching for the documents (in the method sumOfSquaredWeights() in my

...to get the number of docs that contain a specific term, you can use
IndexReader.docFreq(Term)



: Date: Mon, 13 Jun 2005 21:30:21 +0200
: From: Barbara Krausz <bkrausz@web.de>
: Reply-To: java-user@lucene.apache.org, java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: Determining the IDF while searching for documents
:
: Hi all,
:
: is it possible to determine the IDF (the documents in which a term
: appears) while searching for documents? I implemented an index based on
: trigrams, i.e. the indexterms are now Strings of 3 characters so that my
: search engine finds documents with OCR-Errors. When I'm searching for
: the term "rainstorm" for example I split it up into the trigrams __r,
: _ra, rai, ain, ins...
: First I look for documents which contain at least 8 of the 11 trigrams
: of "rainstorm" (the misspelled "ranstorm" contains 8 of the 11
: trigrams), then I check if the trigrams form a term like "rainstorm". In
: order to compute the TF I count the occurences of terms which are
: similar to the term. But I've got problems to compute the IDF, because I
: must know the number of documents in which the term appears before
: searching for the documents (in the method sumOfSquaredWeights() in my
: weight). I used hsqldb during indexing and saved the number of documents
: for each term. But it's really slow.
: My question is the following: When I'm searching for documents which
: contain terms similar to the searchterm I actually get the number of
: documents that contain the term. But I need the IDF before searching
: these documents for example for BooleanQueries which need the IDF to
: normalize the queryvector. Can I solve this problem, i.e. can I
: determine the IDF later and normalize the BooleanQuery?
:
: Thanks
: Barbara
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: For additional commands, e-mail: java-user-help@lucene.apache.org
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message