lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "mvazquez@ova.st" <mvazq...@ova.st>
Subject Re: Text categorization / classification
Date Thu, 28 Oct 2010 02:13:46 GMT
Thanks a lot!
I was reading about Mahout today.
I'll try that out.
Thanks again
Maria

Sent from my iPhone


On Oct 27, 2010, at 20:59, Lance Norskog <goksron@gmail.com> wrote:

> There are tools for this in the Mahout project. These are oriented
> toward large-scale work.
> 
> http://mahout.apache.org
> 
> There is a big learning curve and you have to learn Hadoop somewhat.
> 
> The book 'Collective Intelligence' includes a suite of Python tools
> for small-scale experiments.
> 
> On Wed, Oct 27, 2010 at 1:12 PM, Maria Vazquez <mvazquez@ova.st> wrote:
>> I need to auto-categorize a large number of documents. They are basically news articles
from major news sources (nytimes, npr, abcnews, etc).
>> I'd like to categorize them automatically. Any suggestions?
>> Lucene in Action suggests using a set of documents to build category vectors and
then comparing each document to each of those vectors and get the closest one.
>> The approach seems pretty simple (from other papers I read on text categorization)
but maybe you guys know of something out there that already does this using Lucene/Solr.
>> Thanks!
>> Maria
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
>> 
> 
> 
> 
> -- 
> Lance Norskog
> goksron@gmail.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message