uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl <peter.klu...@averbis.com>
Subject Re: New dictionary annotator
Date Mon, 05 Dec 2016 12:35:11 GMT
Hi,


for the UIMA Ruta paper, I used the enron email dataset [1], but it is
probably not optimal here.


I think we can find a standard scenario (data+terminology), maybe
something like Genia with MeSH or wikipedia with geonames. Just a quick
guess. I can help setting something up, but probably not before February.


Best,


Peter


[1] https://www.cs.cmu.edu/~enron/

Am 05.12.2016 um 12:56 schrieb Donatas Remeika:
> Hi,
>
> Thanks for feedback.
> Yes, it would be interesting to see benchmark results. Maybe you know where
> I could find examples and data for doing benchmarks in UIMA?
>
> Best regards,
> Donatas
>
>
> On Mon, Dec 5, 2016 at 10:52 AM Peter Klügl <peter.kluegl@averbis.com>
> wrote:
>
>> Hi,
>>
>>
>> a very nice annotator, thank you.
>>
>>
>> Do you have figures how the annotator compares to the others with
>> respect to speed and memory usage?
>>
>> Storing the complete tokens will maybe provide challenges in scenarios
>> with parallelization if the dictionary is not shared between annotators.
>>
>> Would you be interested to set up a benchmark?
>>
>>
>> Because of the limitations of the dictionaries in ruta, I also created a
>> new simple dictionary annotator, but it lives now in our own components
>> repository. Maybe I'll contribute it sometimes to ruta since it provides
>> exactly the functionality the ruta dictionaries miss.
>>
>>
>> Best,
>>
>>
>> Peter
>>
>>
>> Am 30.11.2016 um 15:38 schrieb Donatas Remeika:
>>> Hi,
>>>
>>> Just wanted to let you know that we created a new (probably one more)
>>> dictionary annotator.
>>>
>>> Reasons for creating it was:
>>>  - Quite often we used Ruta in our pipelines only because of its
>> MARKTABLE
>>> action which is able to set several features on annotation
>>>  - Sometimes dictionaries contain duplicate entries with different
>> features
>>> and we need to create annotations for each entry
>>>  - Possibility to use custom dictionary entries tokenizer (default is
>>> whitespace tokenizer)
>>>
>>> It was inspired by both DKPro dictionary-annotator and Ruta MARKTABLE.
>> Big
>>> thanks to their developers!
>>>
>>> Code with examples can be found
>>> https://github.com/tokenmill/dictionary-annotator
>>>
>>> BTW, maybe someone knows Concept Mapper alternative, which is more
>> uimaFIT
>>> friendly?
>>>
>>> Best regards,
>>> Donatas
>>>
>>


Mime
View raw message