mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Grivolla <>
Subject Re: UIMA
Date Wed, 15 Jan 2014 11:15:57 GMT
Hello Burcu,

UIMA has an entirely different purpose actually, and doesn't do 
classification or clustering.  You would rather use UIMA to enrich 
documents (individually) through text analysis and then use the result 
to create better feature vectors to use with Solr, Mahout, etc.

We typically use UIMA to do named entity recognition, sentiment 
analysis, chunking, etc. and then index the result in Solr. From there 
you can either use it for retrieval (i.e. use the enriched 
representation to get a better document similarity measure) or extract 
the vectors to use with Mahout/Weka/Cluto/...


On 14/01/14 16:25, Burcu B wrote:
> Hi,
> I'd like to know why someone should prefer UIMA when developing an
> application for end users to classify and cluster general purpose
> documents?
> I have two options:
> 1- integrating Mahout, SOLR, R ,Hadoop and other file sources such as
>   document man. systems or file system.
> 2- or doing these using UIMA.
> Intiutively, I think that UIMA should be preferred, but I could not justify
> my feeling. I need a list of pros and cons.
> If you could suggest me resources, it would be great.
> Thank you.

View raw message