lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Chaput <>
Subject Document comparison
Date Fri, 18 Feb 2005 21:26:03 GMT
Is there a simple, efficient way to compute similarity of documents 
indexed with Lucene?

My first, naive idea is to use the entire contents of one document as a 
query to the second document, and use the score as a similarity 
measurement. But I think I'm probably way off base with that.

Can any IR pros set me straight? Thanks very much.


Matt Chaput
Word Monkey
Side Effects Software Inc.

"A goddamned ray of sunshine all the goddamned time"
-- Sparkle Hayter

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message