lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Max Pfingsthorn" <m.pfingsth...@hippo.nl>
Subject RE: calculate wi = tfi * IDFi for each document.
Date Fri, 03 Jun 2005 13:03:01 GMT
Aha :)

So you want to do blind relevance feedback?
I guess the term vectors will be the way to go then. Otherwise, I don't know how to access
the terms of a document. And: Are you sure you need the TF.IDF weights for each term ]? Maybe
it would be enough to just use TF for sorting, as that is already present in the term vector.
In any case, Similarity knows how to compute IDF for a term.

Bye!
max

-----Original Message-----
From: Andrew Boyd [mailto:andrew.boyd@mindspring.com]
Sent: Friday, June 03, 2005 14:00
To: java-user@lucene.apache.org
Subject: RE: calculate wi = tfi * IDFi for each document.


Thanks for bearing with me Max.  

I do understand that the hits come back sorted by decending score after their Similarity has
been computed relative to the query vector.  What I was hoping to do was use the built in
fuctionality of lucene to calculate some term weights specifically wi = ti * IDFi.

Assuming I had Hits I was <b>hoping</b> to do something like this:

for(int idx = 0; idx < hits.lingth(); idx++){
   int id = hits.id(idx);

   TermFreqVector[] termFreqVec = indexReader.getTermFreqVectors(id);

   // Using the termFreqVec calculate the wi for each term in that document.
   for(termFreqVec){
       TermWeight wi = Similarity.wi(termFreqVec[],  termFreqVec.length); 

       ...
   }

}

Andrew


-----Original Message-----
From: Max Pfingsthorn <m.pfingsthorn@hippo.nl>
Sent: Jun 3, 2005 4:13 AM
To: java-user@lucene.apache.org
Subject: RE: calculate wi = tfi * IDFi for each document.

Hi,

when IndexSearcher.search gives you a Hits object back, all results are already sorted by
their score, which is computed internally using the Similarity. You can access it via Hits.score(n)
(see http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Hits.html). This is also
shown in the demo in org.apache.lucene.demo.SearchFiles from SVN. (see http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/src/demo/org/apache/lucene/demo/SearchFiles.java?rev=150739&view=markup).

Hope that helps.
max


-----Original Message-----
From: Andrew Boyd [mailto:andrew.boyd@mindspring.com]
Sent: Thursday, June 02, 2005 21:22
To: java-user@lucene.apache.org
Subject: RE: calculate wi = tfi * IDFi for each document.


Ok.  So if I get 10 Documents back from a search and I want to get the top 5 weighted terms
for each of the 10 documents what API call should I use?  I'm unable to find the connection
between Similarity and a Document.

I know I'm missing the elephant that must be in the middle of the room.  Or maybe it's not
there.
Is what I'm trying to do do-able?

Thanks,

Andrew

-----Original Message-----
From: Max Pfingsthorn <m.pfingsthorn@hippo.nl>
Sent: Jun 2, 2005 5:33 AM
To: java-user@lucene.apache.org
Subject: RE: calculate wi = tfi * IDFi for each document.

Hi,

DefaultSimilarity uses exactly this weighting scheme. Makes sense since it's a pretty standard
relevance measure...

Bye!
max

-----Original Message-----
From: Andrew Boyd [mailto:andrew.boyd@mindspring.com]
Sent: Thursday, June 02, 2005 11:39
To: java-user@lucene.apache.org
Subject: calculate wi = tfi * IDFi for each document.


If I have search results how can I calculate, using lucene's API,  wi = tfi * IDFi for each
document.

wi    = term weight
tfi    = term frequency in a document
IDFi = inverse document frequency = log(D/dfi)
dfi   = document frequency or number of documents containing term i
D    = number of documents in my search result

Thanks,

Andrew

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



Andrew Boyd
Software Architect
Sun Certified J2EE Architect
B&B Technical Services Inc.
205.422.2557

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message