uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Heinze" <dhei...@gnoetics.com>
Subject RE: New dictionary annotator
Date Tue, 09 May 2017 21:19:35 GMT
Hi... I just pulled and compiled the dictionaryannotator and am looking through the code. 
I'm looking for something that is faster than UIMA Concept-Mapper.  I don't need all the functionality
of Concept-Mapper, but do need the following:
* match all, e.g. if dict entries are "a b c", "a b" and "b c" and input is "a b c" , I need
to match "a b c", "a b"  and "b c"
* skip tokens, e.g. if dict entry is  "a c d", it should match on input "a b c d"
Can someone familiar with the new dictionary annotator save me some time and say if it supports
these matching strategies?
Also, any sense of how the system scales? 
Thanks / Dan
 
-----Original Message-----
From: Peter Klügl [mailto:peter.kluegl@averbis.com] 
Sent: Tuesday, March 14, 2017 12:52 AM
To: user@uima.apache.org
Subject: Re: New dictionary annotator

Hi,


it's now March and I did not yet find the time to compare the different annotators in your
benchmark.


I just wanted to mention that I did not forget about this and that this is still on my todo
list. However, it could easily be April before I find the time.


Best,


Peter


Am 08.12.2016 um 10:43 schrieb Donatas Remeika:
> Hi,
>
> Peter, I did some benchmark on 20 newsgroups texts. The results can be 
> found here: https://github.com/tokenmill/dictionary-annotator
> I didn't measure memory usage, just compared how fast different 
> annotators do the job.
>
> Best regards,
> Donatas
>
> On Mon, Dec 5, 2016 at 2:35 PM Peter Klügl <peter.kluegl@averbis.com> wrote:
>
>> Hi,
>>
>>
>> for the UIMA Ruta paper, I used the enron email dataset [1], but it 
>> is probably not optimal here.
>>
>>
>> I think we can find a standard scenario (data+terminology), maybe 
>> something like Genia with MeSH or wikipedia with geonames. Just a 
>> quick guess. I can help setting something up, but probably not before February.
>>
>>
>> Best,
>>
>>
>> Peter
>>
>>
>> [1] https://www.cs.cmu.edu/~enron/
>>
>> Am 05.12.2016 um 12:56 schrieb Donatas Remeika:
>>> Hi,
>>>
>>> Thanks for feedback.
>>> Yes, it would be interesting to see benchmark results. Maybe you 
>>> know
>> where
>>> I could find examples and data for doing benchmarks in UIMA?
>>>
>>> Best regards,
>>> Donatas
>>>
>>>
>>> On Mon, Dec 5, 2016 at 10:52 AM Peter Klügl 
>>> <peter.kluegl@averbis.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>> a very nice annotator, thank you.
>>>>
>>>>
>>>> Do you have figures how the annotator compares to the others with 
>>>> respect to speed and memory usage?
>>>>
>>>> Storing the complete tokens will maybe provide challenges in 
>>>> scenarios with parallelization if the dictionary is not shared between annotators.
>>>>
>>>> Would you be interested to set up a benchmark?
>>>>
>>>>
>>>> Because of the limitations of the dictionaries in ruta, I also 
>>>> created a new simple dictionary annotator, but it lives now in our 
>>>> own components repository. Maybe I'll contribute it sometimes to 
>>>> ruta since it provides exactly the functionality the ruta dictionaries miss.
>>>>
>>>>
>>>> Best,
>>>>
>>>>
>>>> Peter
>>>>
>>>>
>>>> Am 30.11.2016 um 15:38 schrieb Donatas Remeika:
>>>>> Hi,
>>>>>
>>>>> Just wanted to let you know that we created a new (probably one 
>>>>> more) dictionary annotator.
>>>>>
>>>>> Reasons for creating it was:
>>>>>  - Quite often we used Ruta in our pipelines only because of its
>>>> MARKTABLE
>>>>> action which is able to set several features on annotation
>>>>>  - Sometimes dictionaries contain duplicate entries with different
>>>> features
>>>>> and we need to create annotations for each entry
>>>>>  - Possibility to use custom dictionary entries tokenizer (default 
>>>>> is whitespace tokenizer)
>>>>>
>>>>> It was inspired by both DKPro dictionary-annotator and Ruta MARKTABLE.
>>>> Big
>>>>> thanks to their developers!
>>>>>
>>>>> Code with examples can be found
>>>>> https://github.com/tokenmill/dictionary-annotator
>>>>>
>>>>> BTW, maybe someone knows Concept Mapper alternative, which is more
>>>> uimaFIT
>>>>> friendly?
>>>>>
>>>>> Best regards,
>>>>> Donatas
>>>>>
>>


Mime
View raw message