lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Spencer <dave-lucene-u...@tropo.com>
Subject Re: TFIDF Implementation
Date Tue, 14 Dec 2004 19:10:35 GMT
Otis Gospodnetic wrote:

> You can also see 'Books like this' example from here
> https://secure.manning.com/catalog/view.php?book=hatcher2&item=source

Well done, uses a term vector, instead of reparsing the orig doc, to 
form the similarity query. Also I like the way you exclude the source 
doc in the query, I didn't think of doing that in my code.

I don't trust calling vector.size() and vector.getTerms() within the 
loop but I haven't looked at the code to see if it calculates  the 
results each time or caches them...
> 
> Otis
> 
> --- Bruce Ritchie <bruce@jivesoftware.com> wrote:
> 
> 
>>Christoph,
>>
>>I'm not entirely certain if this is what you want, but a while back
>>David Spencer did code up a 'More Like This' class which can be used
>>for generating similarities between documents. I can't seem to find
>>this class in the sandbox so I've attached it here. Just repackage
>>and test.
>>
>>
>>Regards,
>>
>>Bruce Ritchie
>>http://www.jivesoftware.com/   
>>
>>
>>>-----Original Message-----
>>>From: Christoph Kiefer [mailto:kiefer@ifi.unizh.ch] 
>>>Sent: December 14, 2004 11:45 AM
>>>To: Lucene Users List
>>>Subject: TFIDF Implementation
>>>
>>>Hi,
>>>My current task/problem is the following: I need to implement 
>>>TFIDF document term ranking using Jakarta Lucene to compute a 
>>>similarity rank between arbitrary documents in the constructed
>>
>>index.
>>
>>>I saw from the API that there are similar functions already 
>>>implemented in the class Similarity and DefaultSimilarity but 
>>>I don't know exactly how to use them. At the time my index 
>>>has about 25000 (small) documents and there are about 75000 
>>>terms stored in total.
>>>Now, my question is simple. Does anybody has done this before 
>>>or could point me to another location for help?
>>>
>>>Thanks for any help in advance.
>>>Christoph 
>>>
>>>--
>>>Christoph Kiefer
>>>
>>>Department of Informatics, University of Zurich
>>>
>>>Office: Uni Irchel 27-K-32
>>>Phone:  +41 (0) 44 / 635 67 26
>>>Email:  kiefer@ifi.unizh.ch
>>>Web:    http://www.ifi.unizh.ch/ddis/christophkiefer.0.html
>>>
>>>
>>>
>>
>>---------------------------------------------------------------------
>>
>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail:
>>
>>lucene-user-help@jakarta.apache.org
>>
>>>
> ---------------------------------------------------------------------
> 
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message