Hi Daniel,
Dictionary annotator is definitely faster than Concept Mapper, but has much
less functionality. It supports only first matching strategy.
Regards,
Donatas
On Wed, May 10, 2017 at 12:19 AM Daniel Heinze <dheinze@gnoetics.com> wrote:
> Hi... I just pulled and compiled the dictionaryannotator and am looking
> through the code. I'm looking for something that is faster than UIMA
> Concept-Mapper. I don't need all the functionality of Concept-Mapper, but
> do need the following:
> * match all, e.g. if dict entries are "a b c", "a b" and "b c" and input
> is "a b c" , I need to match "a b c", "a b" and "b c"
> * skip tokens, e.g. if dict entry is "a c d", it should match on input "a
> b c d"
> Can someone familiar with the new dictionary annotator save me some time
> and say if it supports these matching strategies?
> Also, any sense of how the system scales?
> Thanks / Dan
>
> -----Original Message-----
> From: Peter Klügl [mailto:peter.kluegl@averbis.com]
> Sent: Tuesday, March 14, 2017 12:52 AM
> To: user@uima.apache.org
> Subject: Re: New dictionary annotator
>
> Hi,
>
>
> it's now March and I did not yet find the time to compare the different
> annotators in your benchmark.
>
>
> I just wanted to mention that I did not forget about this and that this is
> still on my todo list. However, it could easily be April before I find the
> time.
>
>
> Best,
>
>
> Peter
>
>
> Am 08.12.2016 um 10:43 schrieb Donatas Remeika:
> > Hi,
> >
> > Peter, I did some benchmark on 20 newsgroups texts. The results can be
> > found here: https://github.com/tokenmill/dictionary-annotator
> > I didn't measure memory usage, just compared how fast different
> > annotators do the job.
> >
> > Best regards,
> > Donatas
> >
> > On Mon, Dec 5, 2016 at 2:35 PM Peter Klügl <peter.kluegl@averbis.com>
> wrote:
> >
> >> Hi,
> >>
> >>
> >> for the UIMA Ruta paper, I used the enron email dataset [1], but it
> >> is probably not optimal here.
> >>
> >>
> >> I think we can find a standard scenario (data+terminology), maybe
> >> something like Genia with MeSH or wikipedia with geonames. Just a
> >> quick guess. I can help setting something up, but probably not before
> February.
> >>
> >>
> >> Best,
> >>
> >>
> >> Peter
> >>
> >>
> >> [1] https://www.cs.cmu.edu/~enron/
> >>
> >> Am 05.12.2016 um 12:56 schrieb Donatas Remeika:
> >>> Hi,
> >>>
> >>> Thanks for feedback.
> >>> Yes, it would be interesting to see benchmark results. Maybe you
> >>> know
> >> where
> >>> I could find examples and data for doing benchmarks in UIMA?
> >>>
> >>> Best regards,
> >>> Donatas
> >>>
> >>>
> >>> On Mon, Dec 5, 2016 at 10:52 AM Peter Klügl
> >>> <peter.kluegl@averbis.com>
> >>> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>>
> >>>> a very nice annotator, thank you.
> >>>>
> >>>>
> >>>> Do you have figures how the annotator compares to the others with
> >>>> respect to speed and memory usage?
> >>>>
> >>>> Storing the complete tokens will maybe provide challenges in
> >>>> scenarios with parallelization if the dictionary is not shared
> between annotators.
> >>>>
> >>>> Would you be interested to set up a benchmark?
> >>>>
> >>>>
> >>>> Because of the limitations of the dictionaries in ruta, I also
> >>>> created a new simple dictionary annotator, but it lives now in our
> >>>> own components repository. Maybe I'll contribute it sometimes to
> >>>> ruta since it provides exactly the functionality the ruta
> dictionaries miss.
> >>>>
> >>>>
> >>>> Best,
> >>>>
> >>>>
> >>>> Peter
> >>>>
> >>>>
> >>>> Am 30.11.2016 um 15:38 schrieb Donatas Remeika:
> >>>>> Hi,
> >>>>>
> >>>>> Just wanted to let you know that we created a new (probably one
> >>>>> more) dictionary annotator.
> >>>>>
> >>>>> Reasons for creating it was:
> >>>>> - Quite often we used Ruta in our pipelines only because of its
> >>>> MARKTABLE
> >>>>> action which is able to set several features on annotation
> >>>>> - Sometimes dictionaries contain duplicate entries with different
> >>>> features
> >>>>> and we need to create annotations for each entry
> >>>>> - Possibility to use custom dictionary entries tokenizer (default
> >>>>> is whitespace tokenizer)
> >>>>>
> >>>>> It was inspired by both DKPro dictionary-annotator and Ruta
> MARKTABLE.
> >>>> Big
> >>>>> thanks to their developers!
> >>>>>
> >>>>> Code with examples can be found
> >>>>> https://github.com/tokenmill/dictionary-annotator
> >>>>>
> >>>>> BTW, maybe someone knows Concept Mapper alternative, which is more
> >>>> uimaFIT
> >>>>> friendly?
> >>>>>
> >>>>> Best regards,
> >>>>> Donatas
> >>>>>
> >>
>
>
|