uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Donatas Remeika <donatas.reme...@gmail.com>
Subject Re: New dictionary annotator
Date Wed, 10 May 2017 05:42:53 GMT
Hi Daniel,

Dictionary annotator is definitely faster than Concept Mapper, but has much
less functionality. It supports only first matching strategy.

Regards,
Donatas

On Wed, May 10, 2017 at 12:19 AM Daniel Heinze <dheinze@gnoetics.com> wrote:

> Hi... I just pulled and compiled the dictionaryannotator and am looking
> through the code.  I'm looking for something that is faster than UIMA
> Concept-Mapper.  I don't need all the functionality of Concept-Mapper, but
> do need the following:
> * match all, e.g. if dict entries are "a b c", "a b" and "b c" and input
> is "a b c" , I need to match "a b c", "a b"  and "b c"
> * skip tokens, e.g. if dict entry is  "a c d", it should match on input "a
> b c d"
> Can someone familiar with the new dictionary annotator save me some time
> and say if it supports these matching strategies?
> Also, any sense of how the system scales?
> Thanks / Dan
>
> -----Original Message-----
> From: Peter Klügl [mailto:peter.kluegl@averbis.com]
> Sent: Tuesday, March 14, 2017 12:52 AM
> To: user@uima.apache.org
> Subject: Re: New dictionary annotator
>
> Hi,
>
>
> it's now March and I did not yet find the time to compare the different
> annotators in your benchmark.
>
>
> I just wanted to mention that I did not forget about this and that this is
> still on my todo list. However, it could easily be April before I find the
> time.
>
>
> Best,
>
>
> Peter
>
>
> Am 08.12.2016 um 10:43 schrieb Donatas Remeika:
> > Hi,
> >
> > Peter, I did some benchmark on 20 newsgroups texts. The results can be
> > found here: https://github.com/tokenmill/dictionary-annotator
> > I didn't measure memory usage, just compared how fast different
> > annotators do the job.
> >
> > Best regards,
> > Donatas
> >
> > On Mon, Dec 5, 2016 at 2:35 PM Peter Klügl <peter.kluegl@averbis.com>
> wrote:
> >
> >> Hi,
> >>
> >>
> >> for the UIMA Ruta paper, I used the enron email dataset [1], but it
> >> is probably not optimal here.
> >>
> >>
> >> I think we can find a standard scenario (data+terminology), maybe
> >> something like Genia with MeSH or wikipedia with geonames. Just a
> >> quick guess. I can help setting something up, but probably not before
> February.
> >>
> >>
> >> Best,
> >>
> >>
> >> Peter
> >>
> >>
> >> [1] https://www.cs.cmu.edu/~enron/
> >>
> >> Am 05.12.2016 um 12:56 schrieb Donatas Remeika:
> >>> Hi,
> >>>
> >>> Thanks for feedback.
> >>> Yes, it would be interesting to see benchmark results. Maybe you
> >>> know
> >> where
> >>> I could find examples and data for doing benchmarks in UIMA?
> >>>
> >>> Best regards,
> >>> Donatas
> >>>
> >>>
> >>> On Mon, Dec 5, 2016 at 10:52 AM Peter Klügl
> >>> <peter.kluegl@averbis.com>
> >>> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>>
> >>>> a very nice annotator, thank you.
> >>>>
> >>>>
> >>>> Do you have figures how the annotator compares to the others with
> >>>> respect to speed and memory usage?
> >>>>
> >>>> Storing the complete tokens will maybe provide challenges in
> >>>> scenarios with parallelization if the dictionary is not shared
> between annotators.
> >>>>
> >>>> Would you be interested to set up a benchmark?
> >>>>
> >>>>
> >>>> Because of the limitations of the dictionaries in ruta, I also
> >>>> created a new simple dictionary annotator, but it lives now in our
> >>>> own components repository. Maybe I'll contribute it sometimes to
> >>>> ruta since it provides exactly the functionality the ruta
> dictionaries miss.
> >>>>
> >>>>
> >>>> Best,
> >>>>
> >>>>
> >>>> Peter
> >>>>
> >>>>
> >>>> Am 30.11.2016 um 15:38 schrieb Donatas Remeika:
> >>>>> Hi,
> >>>>>
> >>>>> Just wanted to let you know that we created a new (probably one
> >>>>> more) dictionary annotator.
> >>>>>
> >>>>> Reasons for creating it was:
> >>>>>  - Quite often we used Ruta in our pipelines only because of its
> >>>> MARKTABLE
> >>>>> action which is able to set several features on annotation
> >>>>>  - Sometimes dictionaries contain duplicate entries with different
> >>>> features
> >>>>> and we need to create annotations for each entry
> >>>>>  - Possibility to use custom dictionary entries tokenizer (default
> >>>>> is whitespace tokenizer)
> >>>>>
> >>>>> It was inspired by both DKPro dictionary-annotator and Ruta
> MARKTABLE.
> >>>> Big
> >>>>> thanks to their developers!
> >>>>>
> >>>>> Code with examples can be found
> >>>>> https://github.com/tokenmill/dictionary-annotator
> >>>>>
> >>>>> BTW, maybe someone knows Concept Mapper alternative, which is more
> >>>> uimaFIT
> >>>>> friendly?
> >>>>>
> >>>>> Best regards,
> >>>>> Donatas
> >>>>>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message