Return-Path: X-Original-To: apmail-uima-user-archive@www.apache.org Delivered-To: apmail-uima-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6A22690D9 for ; Mon, 11 Jun 2012 10:50:11 +0000 (UTC) Received: (qmail 88701 invoked by uid 500); 11 Jun 2012 10:50:10 -0000 Delivered-To: apmail-uima-user-archive@uima.apache.org Received: (qmail 88352 invoked by uid 500); 11 Jun 2012 10:50:08 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 88315 invoked by uid 99); 11 Jun 2012 10:50:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Jun 2012 10:50:06 +0000 X-ASF-Spam-Status: No, hits=0.9 required=5.0 tests=FSL_RCVD_USER,RCVD_NUMERIC_HELO,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of gcaug-uima-user@m.gmane.org designates 80.91.229.3 as permitted sender) Received: from [80.91.229.3] (HELO plane.gmane.org) (80.91.229.3) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Jun 2012 10:49:57 +0000 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1Se2Bc-0000vj-4w for user@uima.apache.org; Mon, 11 Jun 2012 12:49:28 +0200 Received: from 84.88.76.136 ([84.88.76.136]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 11 Jun 2012 12:49:28 +0200 Received: from j+asf by 84.88.76.136 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 11 Jun 2012 12:49:28 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: user@uima.apache.org From: Jens Grivolla Subject: Re: Clustering, Collapsing Date: Mon, 11 Jun 2012 12:48:37 +0200 Lines: 29 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: 84.88.76.136 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1 In-Reply-To: This sounds like you are actually looking for the project next door: Mahout. UIMA really doesn't have a lot to do with clustering (although you could do some things). We do use UIMA for information extraction *before* clustering and sending it to Solr, though, as a sort of preprocessing to get relevant features from unstructured text. But it doesn't sound like that's what you're trying to do. HTH, Jens On 06/08/2012 05:44 PM, Deejay wrote: > Hi all, > > I recently discovered Apache UIMA, and it looks like a very large project! I > was hoping that someone more experienced with it than I could comment on > whether there are parts of the project that could help with my problem. > > I need to go over many millions of objects (Protocol Buffers in HBase, as it > happens), and cluster them according to their similarity. Once each cluster is > formed, I need to 'collapse' each property of the objects to find the most > prevalent value. After this, the collapsed object will be added to a Solr > index. > > Would any part of Apache UIMA be useful for the clustering or collapsing, or > have I misunderstood the nature of the project? > >