lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas D'Silva" <twdsi...@gmail.com>
Subject MoreLikeThis Extension for documents that have tags
Date Thu, 03 Sep 2009 22:56:24 GMT
Hi,

I would like to contribute a class based on the MoreLikeThis class in
contrib/queries that generates a query based on the tags associated
with a document. The class assumes that documents are tagged with a
set of tags (which are stored in the index in a seperate Field). The
class determines the top document terms associated with a given tag
using the information gain metric.

While generating a MoreLikeThis query for a document the tags
associated with document are used to determine the terms in the query.
This class is useful for finding similar documents to a document that
does not have many relevant terms but was tagged.

I have attached the class and a test class and would appreciate any feedback.

Thank you,
Thomas

Mime
View raw message