lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: Text categorization / classification
Date Wed, 27 Oct 2010 23:59:27 GMT
There are tools for this in the Mahout project. These are oriented
toward large-scale work.

http://mahout.apache.org

There is a big learning curve and you have to learn Hadoop somewhat.

The book 'Collective Intelligence' includes a suite of Python tools
for small-scale experiments.

On Wed, Oct 27, 2010 at 1:12 PM, Maria Vazquez <mvazquez@ova.st> wrote:
> I need to auto-categorize a large number of documents. They are basically news articles
from major news sources (nytimes, npr, abcnews, etc).
> I'd like to categorize them automatically. Any suggestions?
> Lucene in Action suggests using a set of documents to build category vectors and then
comparing each document to each of those vectors and get the closest one.
> The approach seems pretty simple (from other papers I read on text categorization) but
maybe you guys know of something out there that already does this using Lucene/Solr.
> Thanks!
> Maria
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>



-- 
Lance Norskog
goksron@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message