uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl <pklu...@uni-wuerzburg.de>
Subject Re: Ruta parallel execution
Date Fri, 09 Jan 2015 14:44:48 GMT

Am 09.01.2015 um 15:28 schrieb Silvestre Losada:
> Hi Peter
> I missed this email. I see your point about the analysis engines changing
> arbitrary the annotations, however that fact can occur now, if a script
> uses EXEC action to execute external analysis engine, I think that an extra
> parameter could be added to ruta to specify if ruta tokenization,
> RutaAnnotations and RutaStream can be reused. I think that it may be
> possible to reuse ruta tokenization (annotations stream) across same Cas.

Yes, this should be possible, or let me say it this way: the
tokenization of one seeder should be reused at any case.  Other scripts
may apply additional seeder, but that won't probably not be the common
case. Reusing RutaStream will be complicated, especially for
multi-view/cas-multiplier pipelines. I think the best way is to share
and update the RutaBasics.

There are many options to improve the performance when applying several
analysis engines in a normal UIMA pipeline. Especially the internal
indexing should be improved. The main reason why these improvements are
not yet implemented can probably be found in our use cases (no parallel
execution, applying one complex script, no need for high performance).

I am open for all improvements. In my opinion, we should create a test
pipeline as a unit test and then optimize all aspects.



> Best Silvestre.
> On 31 December 2014 at 13:31, Peter Klügl <pkluegl@uni-wuerzburg.de> wrote:
>> Am 29.12.2014 um 16:24 schrieb Silvestre Losada:
>>> Thanks for your answer, I was working in this way and seems to be best
>>> approach. The problem here is that I need to setup several RutaEngines in
>>> the pipe, it would be nice if RutaStream or at least ruta annotations
>>> generated can be reused from one RutaEngine to another RutaEngine in same
>>> pipe, to avoid duplicated information. If you wish I can implement it and
>>> submit a patch to you.
>> Oh yes, this causes a real slowdown when applying several scripts within a
>> pipeline. All help is welcome :-)
>> The main problem is that ruta requires additional indexing information for
>> conditions like PARTOF (which otherwise would be terribly slow). I don't
>> think that reusing the RutaStream would help because there could be an
>> arbitrary analysis engine changing arbitrary annotations. The RutaBasic
>> annotations are already reused to some extend, but the indexing is done
>> again. My first guess would be that we add another configuration parameter
>> with a list of all types that analysis engines applied after the last ruta
>> engine may have changed. Some helper methods could set these values
>> automatically given a pipeline. We could also use the capabilities of the
>> engines, but I am not sure that they are always correctly set.
>> What do you think?
>> Best,
>> Peter
>>> Kind regards.
>>> On 19 December 2014 at 17:54, Peter Klügl <pkluegl@uni-wuerzburg.de>
>>> wrote:
>>>  Am 19.12.2014 15:10, schrieb Silvestre Losada:
>>>>> Hi Jens,
>>>>> First of all thanks for your detailed answer. UIMA ruta has an option
>>>>> order to execute an analisys engine from ruta script here
>>>>> <http://goo.gl/ekbhv8> is described. So inside the script you can
>>>> execute
>>>>> the analysis engine and then apply some rules to the annotations created
>>>> by
>>>>> the analysis engine. What I want is to have the option to execute the
>>>>> analysis engines in parallel to save time. Would it be possible?
>>>> That's not possible in that way that you use more or other processes for
>>>> the contained analysis engine than for the ruta script. The analysis
>>>> engine and the rules can be parallelized together as one analysis engine
>>>> namely that one of the script.
>>>> You should probably extract the analysis engine into a pipeline, which
>>>> applies the analysis engine and then the script (resp. its analysis
>>>> engine). Then, the normal UIMA-AS setting applies.
>>>> Best,
>>>> Peter
>>>>  Kind regards
>>>>> On 19 December 2014 at 12:35, Jens Grivolla <j+asf@grivolla.net>
>>>>>> Hi Silvestre,
>>>>>> there doesn't seem to be anything RUTA-specific in your question.
>>>>>> principle, UIMA-AS allows parallel scaleout and merges the results
>>>>> (though
>>>>> I personally have never used it this way), but there are of course a
>>>>>> things to take into account.
>>>>>> First, you will of course need to properly define the dependencies
>>>>> between
>>>>> your different analysis engines to ensure you always have all then
>>>>>> necessary information available, meaning that you can only run things
>>>>>> in
>>>>>> parallel that are independent of one another. And then you will have
>>>>> see
>>>>> if the overhead from distributing your CAS to several engines running
>>>>>> parallel and then merging the results is not greater than just having
>>>>> it in
>>>>> one colocated pipeline that can pass the information more efficiently.
>>>>>> guess you'll have to benchmark your specific application, but maybe
>>>>>> somebody with more experience can give you some general directions...
>>>>>> Best,
>>>>>> Jens
>>>>>> On Thu, Dec 18, 2014 at 12:26 PM, Silvestre Losada <
>>>>>> silvestre.losada@gmail.com> wrote:
>>>>>>> Well let me explain.
>>>>>>> Ruta scripts are really good to work over output of analysis
>>>>>> each
>>>>>>> analysis engine will make some atomic work and using ruta rules
>>>>>>> can
>>>>>>> easily work over generated annotations combine them, remove them...
>>>>>> What I
>>>>>>> need is to execute several analysis engines in parallel to improve
>>>>>>> response time, so now the analysis engines are executed sequentially
>>>>>> and
>>>>> I
>>>>>>> want to execute them in parallel, then take the output of all
of them
>>>>>> and
>>>>> apply some ruta rules to the output.
>>>>>>> would it be possible.
>>>>>>> On 17 December 2014 at 18:13, Peter Klügl <pkluegl@uni-wuerzburg.de>
>>>>>>> wrote:
>>>>>>>> Hi,
>>>>>>>> I haven't used UIMA-AS (with ruta) in a real application
yet, but I
>>>>>>>> tested it once for an rc. Did you face any problems?
>>>>>>>> Best
>>>>>>>> Peter
>>>>>>>> Am 17.12.2014 14:34, schrieb Silvestre Losada:
>>>>>>>>> Hi All,
>>>>>>>>> Is there any way to execute ruta scripts in parallel,
using uima-AS
>>>>>>>>>   aproach? in case yes could you provide me an example.
>>>>>>>>> Kind regards.

View raw message