Return-Path: X-Original-To: apmail-uima-user-archive@www.apache.org Delivered-To: apmail-uima-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B6965C52E for ; Fri, 27 Apr 2012 20:00:24 +0000 (UTC) Received: (qmail 2261 invoked by uid 500); 27 Apr 2012 20:00:24 -0000 Delivered-To: apmail-uima-user-archive@uima.apache.org Received: (qmail 2216 invoked by uid 500); 27 Apr 2012 20:00:24 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 2205 invoked by uid 99); 27 Apr 2012 20:00:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Apr 2012 20:00:24 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [128.2.217.196] (HELO smtp01.srv.cs.cmu.edu) (128.2.217.196) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Apr 2012 20:00:18 +0000 Received: from [128.2.208.202] (oz.lti.cs.cmu.edu [128.2.208.202]) (authenticated bits=0) by smtp01.srv.cs.cmu.edu (8.13.6/8.13.6) with ESMTP id q3RJxubO017732 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Fri, 27 Apr 2012 15:59:57 -0400 (EDT) Message-ID: <4F9AFABC.4020503@cs.cmu.edu> Date: Fri, 27 Apr 2012 15:59:56 -0400 From: Eric Riebling User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0) Gecko/20120327 Thunderbird/11.0.1 MIME-Version: 1.0 To: user@uima.apache.org Subject: Re: Running UIMA on a cluster References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: mimedefang-cmuscs on 128.2.217.196 X-Virus-Checked: Checked by ClamAV on apache.org We've had success deploying annotators on cluster nodes (using UIMA-AS deployment descriptors) registered to a UIMA-AS broker running on the head node. If the cluster use shared data folders, you only need to put the code in one place for it to 'appear' on all nodes. Then we run a collection reader and CAS consumer on the head node, with the amount of scale-out specified on the command line of runRemoteAsyncAE.sh, something like this: $UIMA_HOME/bin/runRemoteAsyncAE.sh -c (path.to)XmiCollectionReader.xml tcp://localhost:6 1616 (name of deployed service) -p (number of nodes) -o output_foldername With enough scale-out, the limiting factor becomes the speed of the CR and CC on the head node. This is the briefest explanation I can give, not sure it's a 'best practice' but it works. :) On 4/27/2012 3:35 PM, John David Osborne wrote: > Hello, > > Is there any best practice documentation out there for running > UIMA/UIMA-AS on a cluster? I have only run single machine instances of > UIMA (mostly through Eclipse) and have not investigated the ability to > perform multiple simultaneous analyses in order to process large document > collections. > > It's not clear to me how UIMA would operate in a cluster environment, do > people really do message passing using JMI? I'm guessing this is the case > as I seeing references to MPICH, SGE or other things I am more used to. > I've looked through some of the documentation (including all the Overview > & SDK setup) but am not finding anything helpful. I've also tried googling > but I am not getting much except this: > http://comments.gmane.org/gmane.comp.apache.uima.general/2131 which makes > me think it is possible. > > Currently with my level of confusion I think it may be best to have > multiple instances of UIMA on a cluster and just submit jobs processing > discrete document sets to our SGE cluster and ignore whatever scaling > features are actually present in UIMA since the document processing I plan > to do is data parallel. > > -John > > -- Eric Riebling Senior Systems Programmer http://ericriebling.com CMU Language Technologies Institute