uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl <pklu...@uni-wuerzburg.de>
Subject Re: Ruta parallel execution
Date Wed, 31 Dec 2014 12:31:41 GMT
Am 29.12.2014 um 16:24 schrieb Silvestre Losada:
> Thanks for your answer, I was working in this way and seems to be best
> approach. The problem here is that I need to setup several RutaEngines in
> the pipe, it would be nice if RutaStream or at least ruta annotations
> generated can be reused from one RutaEngine to another RutaEngine in same
> pipe, to avoid duplicated information. If you wish I can implement it and
> submit a patch to you.

Oh yes, this causes a real slowdown when applying several scripts within 
a pipeline. All help is welcome :-)

The main problem is that ruta requires additional indexing information 
for conditions like PARTOF (which otherwise would be terribly slow). I 
don't think that reusing the RutaStream would help because there could 
be an arbitrary analysis engine changing arbitrary annotations. The 
RutaBasic annotations are already reused to some extend, but the 
indexing is done again. My first guess would be that we add another 
configuration parameter with a list of all types that analysis engines 
applied after the last ruta engine may have changed. Some helper methods 
could set these values automatically given a pipeline. We could also use 
the capabilities of the engines, but I am not sure that they are always 
correctly set.

What do you think?



> Kind regards.
> On 19 December 2014 at 17:54, Peter Klügl <pkluegl@uni-wuerzburg.de> wrote:
>> Am 19.12.2014 15:10, schrieb Silvestre Losada:
>>> Hi Jens,
>>> First of all thanks for your detailed answer. UIMA ruta has an option in
>>> order to execute an analisys engine from ruta script here
>>> <http://goo.gl/ekbhv8> is described. So inside the script you can
>> execute
>>> the analysis engine and then apply some rules to the annotations created
>> by
>>> the analysis engine. What I want is to have the option to execute the
>>> analysis engines in parallel to save time. Would it be possible?
>> That's not possible in that way that you use more or other processes for
>> the contained analysis engine than for the ruta script. The analysis
>> engine and the rules can be parallelized together as one analysis engine
>> namely that one of the script.
>> You should probably extract the analysis engine into a pipeline, which
>> applies the analysis engine and then the script (resp. its analysis
>> engine). Then, the normal UIMA-AS setting applies.
>> Best,
>> Peter
>>> Kind regards
>>> On 19 December 2014 at 12:35, Jens Grivolla <j+asf@grivolla.net> wrote:
>>>> Hi Silvestre,
>>>> there doesn't seem to be anything RUTA-specific in your question. In
>>>> principle, UIMA-AS allows parallel scaleout and merges the results
>> (though
>>>> I personally have never used it this way), but there are of course a few
>>>> things to take into account.
>>>> First, you will of course need to properly define the dependencies
>> between
>>>> your different analysis engines to ensure you always have all then
>>>> necessary information available, meaning that you can only run things in
>>>> parallel that are independent of one another. And then you will have to
>> see
>>>> if the overhead from distributing your CAS to several engines running in
>>>> parallel and then merging the results is not greater than just having
>> it in
>>>> one colocated pipeline that can pass the information more efficiently. I
>>>> guess you'll have to benchmark your specific application, but maybe
>>>> somebody with more experience can give you some general directions...
>>>> Best,
>>>> Jens
>>>> On Thu, Dec 18, 2014 at 12:26 PM, Silvestre Losada <
>>>> silvestre.losada@gmail.com> wrote:
>>>>> Well let me explain.
>>>>> Ruta scripts are really good to work over output of analysis engines,
>>>> each
>>>>> analysis engine will make some atomic work and using ruta rules you can
>>>>> easily work over generated annotations combine them, remove them...
>>>> What I
>>>>> need is to execute several analysis engines in parallel to improve the
>>>>> response time, so now the analysis engines are executed sequentially
>> and
>>>> I
>>>>> want to execute them in parallel, then take the output of all of them
>> and
>>>>> apply some ruta rules to the output.
>>>>> would it be possible.
>>>>> On 17 December 2014 at 18:13, Peter Klügl <pkluegl@uni-wuerzburg.de>
>>>>> wrote:
>>>>>> Hi,
>>>>>> I haven't used UIMA-AS (with ruta) in a real application yet, but
>>>>>> tested it once for an rc. Did you face any problems?
>>>>>> Best
>>>>>> Peter
>>>>>> Am 17.12.2014 14:34, schrieb Silvestre Losada:
>>>>>>> Hi All,
>>>>>>> Is there any way to execute ruta scripts in parallel, using uima-AS
>>>>>>>   aproach? in case yes could you provide me an example.
>>>>>>> Kind regards.

View raw message