uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tommaso Teofili <tommaso.teof...@gmail.com>
Subject Re: Clustering, Collapsing
Date Mon, 11 Jun 2012 06:59:48 GMT
Hi Deejay,

2012/6/8 Deejay <deejay@binarytweed.com>

> Hi all,
> I recently discovered Apache UIMA, and it looks like a very large project!
> I
> was hoping that someone more experienced with it than I could comment on
> whether there are parts of the project that could help with my problem.
> I need to go over many millions of objects (Protocol Buffers in HBase, as
> it
> happens), and cluster them according to their similarity. Once each
> cluster is
> formed, I need to 'collapse' each property of the objects to find the most
> prevalent value. After this, the collapsed object will be added to a Solr
> index.

I think you could take advantage of UIMA Collection Processing Engine [1],
particularly by using a UIMA-AS based architecture since it looks like you
are handling huge collections [2].
Apart from the specific algorithms used for clustering / collapsing, which
would define the UIMA pipeline implementations/configurations, you could
use SolrCas [3] to finally write data in the index.

> Would any part of Apache UIMA be useful for the clustering or collapsing,
> or
> have I misunderstood the nature of the project?

[1] :
[2] : http://uima.apache.org/doc-uimaas-what.html
[3] : http://uima.apache.org/sandbox.html#solrcas.consumer

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message