Return-Path: X-Original-To: apmail-uima-user-archive@www.apache.org Delivered-To: apmail-uima-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5C21CCA7D for ; Fri, 9 Jan 2015 14:51:00 +0000 (UTC) Received: (qmail 57076 invoked by uid 500); 9 Jan 2015 14:51:01 -0000 Delivered-To: apmail-uima-user-archive@uima.apache.org Received: (qmail 57026 invoked by uid 500); 9 Jan 2015 14:51:01 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 57014 invoked by uid 99); 9 Jan 2015 14:51:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Jan 2015 14:51:00 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS,URIBL_DBL_ABUSE_REDIR X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [132.187.3.35] (HELO mailrelay.rz.uni-wuerzburg.de) (132.187.3.35) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Jan 2015 14:50:56 +0000 Received: from virusscan-slb.rz.uni-wuerzburg.de (localhost [127.0.0.1]) by mailrelay-slb.rz.uni-wuerzburg.de (Postfix) with ESMTP id 00D30800462 for ; Fri, 9 Jan 2015 15:50:34 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by virusscan-slb.rz.uni-wuerzburg.de (Postfix) with ESMTP id F1621183AB98 for ; Fri, 9 Jan 2015 15:50:34 +0100 (CET) X-Virus-Scanned: amavisd-new at uni-wuerzburg.de Received: from mailmaster.uni-wuerzburg.de ([127.0.0.1]) by localhost (vmail001.slb.uni-wuerzburg.de [127.0.0.1]) (amavisd-new, port 10225) with ESMTP id k4BsbPAAU82X for ; Fri, 9 Jan 2015 15:50:34 +0100 (CET) Received: from [132.187.15.93] (win6093.informatik.uni-wuerzburg.de [132.187.15.93]) by mailmaster.uni-wuerzburg.de (Postfix) with ESMTPSA id CCBB912B062E for ; Fri, 9 Jan 2015 15:50:34 +0100 (CET) Message-ID: <54AFEABA.101@uni-wuerzburg.de> Date: Fri, 09 Jan 2015 15:50:34 +0100 From: =?UTF-8?B?UGV0ZXIgS2zDvGds?= User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: user@uima.apache.org Subject: Re: Ruta parallel execution References: <5491B9B8.5010504@uni-wuerzburg.de> <5494582E.4030805@uni-wuerzburg.de> <54A3ECAD.9060303@uni-wuerzburg.de> <54AFE960.8030900@uni-wuerzburg.de> In-Reply-To: <54AFE960.8030900@uni-wuerzburg.de> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org As for reusing the tokenization, may we should add something like this logic (reusing as default): for each seeder if there are annotations of my seeding types if new config param is true // for partial or corrupt tokenizations remove all seeding annotations and generate them anew else do nothing else // no tokenization yet generate seeding annotations Best, Peter Am 09.01.2015 um 15:44 schrieb Peter Klügl: > Hi, > > Am 09.01.2015 um 15:28 schrieb Silvestre Losada: >> Hi Peter >> >> I missed this email. I see your point about the analysis engines changing >> arbitrary the annotations, however that fact can occur now, if a script >> uses EXEC action to execute external analysis engine, I think that an extra >> parameter could be added to ruta to specify if ruta tokenization, >> RutaAnnotations and RutaStream can be reused. I think that it may be >> possible to reuse ruta tokenization (annotations stream) across same Cas. > Yes, this should be possible, or let me say it this way: the > tokenization of one seeder should be reused at any case. Other scripts > may apply additional seeder, but that won't probably not be the common > case. Reusing RutaStream will be complicated, especially for > multi-view/cas-multiplier pipelines. I think the best way is to share > and update the RutaBasics. > > There are many options to improve the performance when applying several > analysis engines in a normal UIMA pipeline. Especially the internal > indexing should be improved. The main reason why these improvements are > not yet implemented can probably be found in our use cases (no parallel > execution, applying one complex script, no need for high performance). > > I am open for all improvements. In my opinion, we should create a test > pipeline as a unit test and then optimize all aspects. > > Best, > > Peter > > >> Best Silvestre. >> >> On 31 December 2014 at 13:31, Peter Klügl wrote: >> >>> Am 29.12.2014 um 16:24 schrieb Silvestre Losada: >>> >>>> Thanks for your answer, I was working in this way and seems to be best >>>> approach. The problem here is that I need to setup several RutaEngines in >>>> the pipe, it would be nice if RutaStream or at least ruta annotations >>>> generated can be reused from one RutaEngine to another RutaEngine in same >>>> pipe, to avoid duplicated information. If you wish I can implement it and >>>> submit a patch to you. >>>> >>> Oh yes, this causes a real slowdown when applying several scripts within a >>> pipeline. All help is welcome :-) >>> >>> The main problem is that ruta requires additional indexing information for >>> conditions like PARTOF (which otherwise would be terribly slow). I don't >>> think that reusing the RutaStream would help because there could be an >>> arbitrary analysis engine changing arbitrary annotations. The RutaBasic >>> annotations are already reused to some extend, but the indexing is done >>> again. My first guess would be that we add another configuration parameter >>> with a list of all types that analysis engines applied after the last ruta >>> engine may have changed. Some helper methods could set these values >>> automatically given a pipeline. We could also use the capabilities of the >>> engines, but I am not sure that they are always correctly set. >>> >>> What do you think? >>> >>> Best, >>> >>> Peter >>> >>> >>> >>>> Kind regards. >>>> >>>> On 19 December 2014 at 17:54, Peter Klügl >>>> wrote: >>>> >>>> Am 19.12.2014 15:10, schrieb Silvestre Losada: >>>>>> Hi Jens, >>>>>> >>>>>> First of all thanks for your detailed answer. UIMA ruta has an option in >>>>>> order to execute an analisys engine from ruta script here >>>>>> is described. So inside the script you can >>>>>> >>>>> execute >>>>> >>>>>> the analysis engine and then apply some rules to the annotations created >>>>>> >>>>> by >>>>> >>>>>> the analysis engine. What I want is to have the option to execute the >>>>>> analysis engines in parallel to save time. Would it be possible? >>>>>> >>>>> That's not possible in that way that you use more or other processes for >>>>> the contained analysis engine than for the ruta script. The analysis >>>>> engine and the rules can be parallelized together as one analysis engine >>>>> namely that one of the script. >>>>> >>>>> You should probably extract the analysis engine into a pipeline, which >>>>> applies the analysis engine and then the script (resp. its analysis >>>>> engine). Then, the normal UIMA-AS setting applies. >>>>> >>>>> Best, >>>>> >>>>> Peter >>>>> >>>>> >>>>> Kind regards >>>>>> On 19 December 2014 at 12:35, Jens Grivolla wrote: >>>>>> >>>>>>> Hi Silvestre, >>>>>>> >>>>>>> there doesn't seem to be anything RUTA-specific in your question. In >>>>>>> principle, UIMA-AS allows parallel scaleout and merges the results >>>>>>> >>>>>> (though >>>>>> I personally have never used it this way), but there are of course a few >>>>>>> things to take into account. >>>>>>> >>>>>>> First, you will of course need to properly define the dependencies >>>>>>> >>>>>> between >>>>>> your different analysis engines to ensure you always have all then >>>>>>> necessary information available, meaning that you can only run things >>>>>>> in >>>>>>> parallel that are independent of one another. And then you will have to >>>>>>> >>>>>> see >>>>>> if the overhead from distributing your CAS to several engines running in >>>>>>> parallel and then merging the results is not greater than just having >>>>>>> >>>>>> it in >>>>>> one colocated pipeline that can pass the information more efficiently. I >>>>>>> guess you'll have to benchmark your specific application, but maybe >>>>>>> somebody with more experience can give you some general directions... >>>>>>> >>>>>>> Best, >>>>>>> Jens >>>>>>> >>>>>>> On Thu, Dec 18, 2014 at 12:26 PM, Silvestre Losada < >>>>>>> silvestre.losada@gmail.com> wrote: >>>>>>> >>>>>>>> Well let me explain. >>>>>>>> >>>>>>>> Ruta scripts are really good to work over output of analysis engines, >>>>>>>> >>>>>>> each >>>>>>> >>>>>>>> analysis engine will make some atomic work and using ruta rules you >>>>>>>> can >>>>>>>> easily work over generated annotations combine them, remove them... >>>>>>>> >>>>>>> What I >>>>>>> >>>>>>>> need is to execute several analysis engines in parallel to improve the >>>>>>>> response time, so now the analysis engines are executed sequentially >>>>>>>> >>>>>>> and >>>>>> I >>>>>>>> want to execute them in parallel, then take the output of all of them >>>>>>>> >>>>>>> and >>>>>> apply some ruta rules to the output. >>>>>>>> would it be possible. >>>>>>>> >>>>>>>> On 17 December 2014 at 18:13, Peter Klügl >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I haven't used UIMA-AS (with ruta) in a real application yet, but I >>>>>>>>> tested it once for an rc. Did you face any problems? >>>>>>>>> >>>>>>>>> Best >>>>>>>>> >>>>>>>>> Peter >>>>>>>>> >>>>>>>>> Am 17.12.2014 14:34, schrieb Silvestre Losada: >>>>>>>>> >>>>>>>>>> Hi All, >>>>>>>>>> >>>>>>>>>> Is there any way to execute ruta scripts in parallel, using uima-AS >>>>>>>>>> aproach? in case yes could you provide me an example. >>>>>>>>>> >>>>>>>>>> Kind regards. >>>>>>>>>> >>>>>>>>>>