uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Silvestre Losada <silvestre.los...@gmail.com>
Subject Re: Ruta parallel execution
Date Fri, 09 Jan 2015 14:59:21 GMT
Thanks peter

I will implement it an submit a patch.

Best

El Vie 09/01/2015, 15:51, Peter Klügl <pkluegl@uni-wuerzburg.de> escribió:

> As for reusing the tokenization, may we should add something like this
> logic (reusing as default):
>
> for each seeder
>   if there are annotations of my seeding types
>     if new config param is true // for partial or corrupt tokenizations
>       remove all seeding annotations and generate them anew
>     else
>       do nothing
>   else // no tokenization yet
>     generate seeding annotations
>
> Best,
>
> Peter
>
>
> Am 09.01.2015 um 15:44 schrieb Peter Klügl:
> > Hi,
> >
> > Am 09.01.2015 um 15:28 schrieb Silvestre Losada:
> >> Hi Peter
> >>
> >> I missed this email. I see your point about the analysis engines
> changing
> >> arbitrary the annotations, however that fact can occur now, if a script
> >> uses EXEC action to execute external analysis engine, I think that an
> extra
> >> parameter could be added to ruta to specify if ruta tokenization,
> >> RutaAnnotations and RutaStream can be reused. I think that it may be
> >> possible to reuse ruta tokenization (annotations stream) across same
> Cas.
> > Yes, this should be possible, or let me say it this way: the
> > tokenization of one seeder should be reused at any case.  Other scripts
> > may apply additional seeder, but that won't probably not be the common
> > case. Reusing RutaStream will be complicated, especially for
> > multi-view/cas-multiplier pipelines. I think the best way is to share
> > and update the RutaBasics.
> >
> > There are many options to improve the performance when applying several
> > analysis engines in a normal UIMA pipeline. Especially the internal
> > indexing should be improved. The main reason why these improvements are
> > not yet implemented can probably be found in our use cases (no parallel
> > execution, applying one complex script, no need for high performance).
> >
> > I am open for all improvements. In my opinion, we should create a test
> > pipeline as a unit test and then optimize all aspects.
> >
> > Best,
> >
> > Peter
> >
> >
> >> Best Silvestre.
> >>
> >> On 31 December 2014 at 13:31, Peter Klügl <pkluegl@uni-wuerzburg.de>
> wrote:
> >>
> >>> Am 29.12.2014 um 16:24 schrieb Silvestre Losada:
> >>>
> >>>> Thanks for your answer, I was working in this way and seems to be best
> >>>> approach. The problem here is that I need to setup several
> RutaEngines in
> >>>> the pipe, it would be nice if RutaStream or at least ruta annotations
> >>>> generated can be reused from one RutaEngine to another RutaEngine in
> same
> >>>> pipe, to avoid duplicated information. If you wish I can implement it
> and
> >>>> submit a patch to you.
> >>>>
> >>> Oh yes, this causes a real slowdown when applying several scripts
> within a
> >>> pipeline. All help is welcome :-)
> >>>
> >>> The main problem is that ruta requires additional indexing information
> for
> >>> conditions like PARTOF (which otherwise would be terribly slow). I
> don't
> >>> think that reusing the RutaStream would help because there could be an
> >>> arbitrary analysis engine changing arbitrary annotations. The RutaBasic
> >>> annotations are already reused to some extend, but the indexing is done
> >>> again. My first guess would be that we add another configuration
> parameter
> >>> with a list of all types that analysis engines applied after the last
> ruta
> >>> engine may have changed. Some helper methods could set these values
> >>> automatically given a pipeline. We could also use the capabilities of
> the
> >>> engines, but I am not sure that they are always correctly set.
> >>>
> >>> What do you think?
> >>>
> >>> Best,
> >>>
> >>> Peter
> >>>
> >>>
> >>>
> >>>> Kind regards.
> >>>>
> >>>> On 19 December 2014 at 17:54, Peter Klügl <pkluegl@uni-wuerzburg.de>
> >>>> wrote:
> >>>>
> >>>>  Am 19.12.2014 15:10, schrieb Silvestre Losada:
> >>>>>> Hi Jens,
> >>>>>>
> >>>>>> First of all thanks for your detailed answer. UIMA ruta has
an
> option in
> >>>>>> order to execute an analisys engine from ruta script here
> >>>>>> <http://goo.gl/ekbhv8> is described. So inside the script
you can
> >>>>>>
> >>>>> execute
> >>>>>
> >>>>>> the analysis engine and then apply some rules to the annotations
> created
> >>>>>>
> >>>>> by
> >>>>>
> >>>>>> the analysis engine. What I want is to have the option to execute
> the
> >>>>>> analysis engines in parallel to save time. Would it be possible?
> >>>>>>
> >>>>> That's not possible in that way that you use more or other processes
> for
> >>>>> the contained analysis engine than for the ruta script. The analysis
> >>>>> engine and the rules can be parallelized together as one analysis
> engine
> >>>>> namely that one of the script.
> >>>>>
> >>>>> You should probably extract the analysis engine into a pipeline,
> which
> >>>>> applies the analysis engine and then the script (resp. its analysis
> >>>>> engine). Then, the normal UIMA-AS setting applies.
> >>>>>
> >>>>> Best,
> >>>>>
> >>>>> Peter
> >>>>>
> >>>>>
> >>>>>  Kind regards
> >>>>>> On 19 December 2014 at 12:35, Jens Grivolla <j+asf@grivolla.net>
> wrote:
> >>>>>>
> >>>>>>> Hi Silvestre,
> >>>>>>>
> >>>>>>> there doesn't seem to be anything RUTA-specific in your
question.
> In
> >>>>>>> principle, UIMA-AS allows parallel scaleout and merges the
results
> >>>>>>>
> >>>>>> (though
> >>>>>> I personally have never used it this way), but there are of
course
> a few
> >>>>>>> things to take into account.
> >>>>>>>
> >>>>>>> First, you will of course need to properly define the dependencies
> >>>>>>>
> >>>>>> between
> >>>>>> your different analysis engines to ensure you always have all
then
> >>>>>>> necessary information available, meaning that you can only
run
> things
> >>>>>>> in
> >>>>>>> parallel that are independent of one another. And then you
will
> have to
> >>>>>>>
> >>>>>> see
> >>>>>> if the overhead from distributing your CAS to several engines
> running in
> >>>>>>> parallel and then merging the results is not greater than
just
> having
> >>>>>>>
> >>>>>> it in
> >>>>>> one colocated pipeline that can pass the information more
> efficiently. I
> >>>>>>> guess you'll have to benchmark your specific application,
but maybe
> >>>>>>> somebody with more experience can give you some general
> directions...
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Jens
> >>>>>>>
> >>>>>>> On Thu, Dec 18, 2014 at 12:26 PM, Silvestre Losada <
> >>>>>>> silvestre.losada@gmail.com> wrote:
> >>>>>>>
> >>>>>>>> Well let me explain.
> >>>>>>>>
> >>>>>>>> Ruta scripts are really good to work over output of
analysis
> engines,
> >>>>>>>>
> >>>>>>> each
> >>>>>>>
> >>>>>>>> analysis engine will make some atomic work and using
ruta rules
> you
> >>>>>>>> can
> >>>>>>>> easily work over generated annotations combine them,
remove
> them...
> >>>>>>>>
> >>>>>>> What I
> >>>>>>>
> >>>>>>>> need is to execute several analysis engines in parallel
to
> improve the
> >>>>>>>> response time, so now the analysis engines are executed
> sequentially
> >>>>>>>>
> >>>>>>> and
> >>>>>> I
> >>>>>>>> want to execute them in parallel, then take the output
of all of
> them
> >>>>>>>>
> >>>>>>> and
> >>>>>> apply some ruta rules to the output.
> >>>>>>>> would it be possible.
> >>>>>>>>
> >>>>>>>> On 17 December 2014 at 18:13, Peter Klügl <
> pkluegl@uni-wuerzburg.de>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I haven't used UIMA-AS (with ruta) in a real application
yet,
> but I
> >>>>>>>>> tested it once for an rc. Did you face any problems?
> >>>>>>>>>
> >>>>>>>>> Best
> >>>>>>>>>
> >>>>>>>>> Peter
> >>>>>>>>>
> >>>>>>>>> Am 17.12.2014 14:34, schrieb Silvestre Losada:
> >>>>>>>>>
> >>>>>>>>>> Hi All,
> >>>>>>>>>>
> >>>>>>>>>> Is there any way to execute ruta scripts in
parallel, using
> uima-AS
> >>>>>>>>>>   aproach? in case yes could you provide me
an example.
> >>>>>>>>>>
> >>>>>>>>>> Kind regards.
> >>>>>>>>>>
> >>>>>>>>>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message