uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Grivolla <j+...@grivolla.net>
Subject Re: Clustering, Collapsing
Date Mon, 11 Jun 2012 10:48:37 GMT
This sounds like you are actually looking for the project next door: Mahout.

UIMA really doesn't have a lot to do with clustering (although you could 
do some things). We do use UIMA for information extraction *before* 
clustering and sending it to Solr, though, as a sort of preprocessing to 
get relevant features from unstructured text. But it doesn't sound like 
that's what you're trying to do.

HTH,
Jens

On 06/08/2012 05:44 PM, Deejay wrote:
> Hi all,
>
> I recently discovered Apache UIMA, and it looks like a very large project! I
> was hoping that someone more experienced with it than I could comment on
> whether there are parts of the project that could help with my problem.
>
> I need to go over many millions of objects (Protocol Buffers in HBase, as it
> happens), and cluster them according to their similarity. Once each cluster is
> formed, I need to 'collapse' each property of the objects to find the most
> prevalent value. After this, the collapsed object will be added to a Solr
> index.
>
> Would any part of Apache UIMA be useful for the clustering or collapsing, or
> have I misunderstood the nature of the project?
>
>



Mime
View raw message