uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl <pklu...@uni-wuerzburg.de>
Subject Re: Ruta parallel execution
Date Fri, 09 Jan 2015 14:50:34 GMT
As for reusing the tokenization, may we should add something like this
logic (reusing as default):

for each seeder
  if there are annotations of my seeding types
    if new config param is true // for partial or corrupt tokenizations
      remove all seeding annotations and generate them anew
      do nothing
  else // no tokenization yet
    generate seeding annotations



Am 09.01.2015 um 15:44 schrieb Peter Klügl:
> Hi,
> Am 09.01.2015 um 15:28 schrieb Silvestre Losada:
>> Hi Peter
>> I missed this email. I see your point about the analysis engines changing
>> arbitrary the annotations, however that fact can occur now, if a script
>> uses EXEC action to execute external analysis engine, I think that an extra
>> parameter could be added to ruta to specify if ruta tokenization,
>> RutaAnnotations and RutaStream can be reused. I think that it may be
>> possible to reuse ruta tokenization (annotations stream) across same Cas.
> Yes, this should be possible, or let me say it this way: the
> tokenization of one seeder should be reused at any case.  Other scripts
> may apply additional seeder, but that won't probably not be the common
> case. Reusing RutaStream will be complicated, especially for
> multi-view/cas-multiplier pipelines. I think the best way is to share
> and update the RutaBasics.
> There are many options to improve the performance when applying several
> analysis engines in a normal UIMA pipeline. Especially the internal
> indexing should be improved. The main reason why these improvements are
> not yet implemented can probably be found in our use cases (no parallel
> execution, applying one complex script, no need for high performance).
> I am open for all improvements. In my opinion, we should create a test
> pipeline as a unit test and then optimize all aspects.
> Best,
> Peter
>> Best Silvestre.
>> On 31 December 2014 at 13:31, Peter Klügl <pkluegl@uni-wuerzburg.de> wrote:
>>> Am 29.12.2014 um 16:24 schrieb Silvestre Losada:
>>>> Thanks for your answer, I was working in this way and seems to be best
>>>> approach. The problem here is that I need to setup several RutaEngines in
>>>> the pipe, it would be nice if RutaStream or at least ruta annotations
>>>> generated can be reused from one RutaEngine to another RutaEngine in same
>>>> pipe, to avoid duplicated information. If you wish I can implement it and
>>>> submit a patch to you.
>>> Oh yes, this causes a real slowdown when applying several scripts within a
>>> pipeline. All help is welcome :-)
>>> The main problem is that ruta requires additional indexing information for
>>> conditions like PARTOF (which otherwise would be terribly slow). I don't
>>> think that reusing the RutaStream would help because there could be an
>>> arbitrary analysis engine changing arbitrary annotations. The RutaBasic
>>> annotations are already reused to some extend, but the indexing is done
>>> again. My first guess would be that we add another configuration parameter
>>> with a list of all types that analysis engines applied after the last ruta
>>> engine may have changed. Some helper methods could set these values
>>> automatically given a pipeline. We could also use the capabilities of the
>>> engines, but I am not sure that they are always correctly set.
>>> What do you think?
>>> Best,
>>> Peter
>>>> Kind regards.
>>>> On 19 December 2014 at 17:54, Peter Klügl <pkluegl@uni-wuerzburg.de>
>>>> wrote:
>>>>  Am 19.12.2014 15:10, schrieb Silvestre Losada:
>>>>>> Hi Jens,
>>>>>> First of all thanks for your detailed answer. UIMA ruta has an option
>>>>>> order to execute an analisys engine from ruta script here
>>>>>> <http://goo.gl/ekbhv8> is described. So inside the script you
>>>>> execute
>>>>>> the analysis engine and then apply some rules to the annotations
>>>>> by
>>>>>> the analysis engine. What I want is to have the option to execute
>>>>>> analysis engines in parallel to save time. Would it be possible?
>>>>> That's not possible in that way that you use more or other processes
>>>>> the contained analysis engine than for the ruta script. The analysis
>>>>> engine and the rules can be parallelized together as one analysis engine
>>>>> namely that one of the script.
>>>>> You should probably extract the analysis engine into a pipeline, which
>>>>> applies the analysis engine and then the script (resp. its analysis
>>>>> engine). Then, the normal UIMA-AS setting applies.
>>>>> Best,
>>>>> Peter
>>>>>  Kind regards
>>>>>> On 19 December 2014 at 12:35, Jens Grivolla <j+asf@grivolla.net>
>>>>>>> Hi Silvestre,
>>>>>>> there doesn't seem to be anything RUTA-specific in your question.
>>>>>>> principle, UIMA-AS allows parallel scaleout and merges the results
>>>>>> (though
>>>>>> I personally have never used it this way), but there are of course
a few
>>>>>>> things to take into account.
>>>>>>> First, you will of course need to properly define the dependencies
>>>>>> between
>>>>>> your different analysis engines to ensure you always have all then
>>>>>>> necessary information available, meaning that you can only run
>>>>>>> in
>>>>>>> parallel that are independent of one another. And then you will
have to
>>>>>> see
>>>>>> if the overhead from distributing your CAS to several engines running
>>>>>>> parallel and then merging the results is not greater than just
>>>>>> it in
>>>>>> one colocated pipeline that can pass the information more efficiently.
>>>>>>> guess you'll have to benchmark your specific application, but
>>>>>>> somebody with more experience can give you some general directions...
>>>>>>> Best,
>>>>>>> Jens
>>>>>>> On Thu, Dec 18, 2014 at 12:26 PM, Silvestre Losada <
>>>>>>> silvestre.losada@gmail.com> wrote:
>>>>>>>> Well let me explain.
>>>>>>>> Ruta scripts are really good to work over output of analysis
>>>>>>> each
>>>>>>>> analysis engine will make some atomic work and using ruta
rules you
>>>>>>>> can
>>>>>>>> easily work over generated annotations combine them, remove
>>>>>>> What I
>>>>>>>> need is to execute several analysis engines in parallel to
improve the
>>>>>>>> response time, so now the analysis engines are executed sequentially
>>>>>>> and
>>>>>> I
>>>>>>>> want to execute them in parallel, then take the output of
all of them
>>>>>>> and
>>>>>> apply some ruta rules to the output.
>>>>>>>> would it be possible.
>>>>>>>> On 17 December 2014 at 18:13, Peter Klügl <pkluegl@uni-wuerzburg.de>
>>>>>>>> wrote:
>>>>>>>>> Hi,
>>>>>>>>> I haven't used UIMA-AS (with ruta) in a real application
yet, but I
>>>>>>>>> tested it once for an rc. Did you face any problems?
>>>>>>>>> Best
>>>>>>>>> Peter
>>>>>>>>> Am 17.12.2014 14:34, schrieb Silvestre Losada:
>>>>>>>>>> Hi All,
>>>>>>>>>> Is there any way to execute ruta scripts in parallel,
using uima-AS
>>>>>>>>>>   aproach? in case yes could you provide me an example.
>>>>>>>>>> Kind regards.

View raw message