uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Tanenblatt <sloth...@park-slope.net>
Subject Re: ConceptMapper: Performance Matrix
Date Mon, 23 Jun 2008 19:06:02 GMT
Yes, CompileDictionary.java will do it. But if dictionary loading time  
is not the problem, I wouldn't bother doing that as it will not buy  
you much. Combining the dictionaries, for now, should make the biggest  
difference.

On Jun 23, 2008, at 3:02 PM, Ahmed Abdeen Hamed wrote:

> Thanks Michael. Dictionaries processing time is reasonable. It's the
> document analyzer execution time that is the bottleneck. I will  
> merge the
> dictionaries and compile them as you suggested. However, I am not  
> sure which
> command line tool you are referring to. Do you mean:
> org 
> .apache.uima.conceptMapper.dictionaryCompiler.CompileDictionary.java?
> Thanks for the vacation heads up.
> Ahmed
>
> On Mon, Jun 23, 2008 at 2:37 PM, Michael Tanenblatt <slothrop@park-slope.net 
> >
> wrote:
>
>> The short answer is "no". Not yet, anyway.
>>
>> But, here are some things that might help. First, if dictionary  
>> loading
>> times are long, you can use the command line tool supplied in the  
>> package to
>> compile the dictionary, and use the compiled dictionary. If you do  
>> this,
>> remember that you will need to change the AE descriptors to use the  
>> correct
>> implementation of the dictionary loader, e.g.:
>>
>> <externalResource>
>>       ...
>>
>> < 
>> implementationName 
>> > 
>> org 
>> .apache 
>> .uima 
>> .conceptMapper 
>> .support.dictionaryResource.CompiledDictionaryResource_impl</ 
>> implementationName>
>>       ...
>> </externalResource>
>>
>> That said, if you are using 13 dictionaries, that means you are  
>> running 13
>> copies of ConceptMapper in your pipeline, which means that you are
>> traversing each file's text  at 13 times just for your ConceptMapper
>> invocations. If you could merge the dictionaries into one, you  
>> should see a
>> marked speedup. Clearly, it a near-term enhancement of  
>> ConceptMapper would
>> be to enable the loading of multiple dictionaries, which get merged  
>> at
>> initialization time.
>>
>> One side note: I am going to be on vacation starting on June 25 and  
>> will
>> only have occasional access to email until I return on July 12. I  
>> will try
>> to answer questions during that time when I do have access, but I  
>> really
>> have no idea how often that will be.
>>
>>
>>
>> On Jun 23, 2008, at 2:19 PM, Ahmed Abdeen Hamed wrote:
>>
>> Hello UIMA members,I am using the document analyzer example to  
>> analyze
>>> large
>>> files from multiple dictionaries. One of the raw files is 7.5MB. The
>>> number
>>> of dictionaries is 13, 1MB is the size of each. Is there some sort  
>>> of a
>>> matrix that you can use to predict the execution time? Has any one  
>>> written
>>> a
>>> paper on the performance analysis of ConceptMapper?
>>> Please let me know if you can.
>>> Best wishes,
>>> --------------------------------------------------------
>>> Ahmed Abdeen Hamed
>>> Scientific Informatics Project Leader
>>> MBLWHOI Library
>>> Marine Biological Laboratory
>>> 7 MBL Street Woods Hole, MA 02543 USA
>>> +1 508 289 7676
>>> --
>>> email: abdeen@mbl.edu
>>> --
>>>
>>


Mime
View raw message