lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maria Vazquez <>
Subject Text categorization / classification
Date Wed, 27 Oct 2010 20:12:12 GMT
I need to auto-categorize a large number of documents. They are basically news articles from
major news sources (nytimes, npr, abcnews, etc).
I'd like to categorize them automatically. Any suggestions?
Lucene in Action suggests using a set of documents to build category vectors and then comparing
each document to each of those vectors and get the closest one.
The approach seems pretty simple (from other papers I read on text categorization) but maybe
you guys know of something out there that already does this using Lucene/Solr.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message