uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ahmed Abdeen Hamed" <ahmed.elma...@gmail.com>
Subject Re: ConceptMapper: Performance Matrix
Date Mon, 23 Jun 2008 19:11:03 GMT
Great, I will combine the dictionaries then. It's also good to know that
compiled dictionaries makes a difference if it is a bottleneck. Have a good
vacation.Ahmed


On Mon, Jun 23, 2008 at 3:06 PM, Michael Tanenblatt <slothrop@park-slope.net>
wrote:

> Yes, CompileDictionary.java will do it. But if dictionary loading time is
> not the problem, I wouldn't bother doing that as it will not buy you much.
> Combining the dictionaries, for now, should make the biggest difference.
>
>
> On Jun 23, 2008, at 3:02 PM, Ahmed Abdeen Hamed wrote:
>
>  Thanks Michael. Dictionaries processing time is reasonable. It's the
>> document analyzer execution time that is the bottleneck. I will merge the
>> dictionaries and compile them as you suggested. However, I am not sure
>> which
>> command line tool you are referring to. Do you mean:
>> org.apache.uima.conceptMapper.dictionaryCompiler.CompileDictionary.java?
>> Thanks for the vacation heads up.
>> Ahmed
>>
>> On Mon, Jun 23, 2008 at 2:37 PM, Michael Tanenblatt <
>> slothrop@park-slope.net>
>> wrote:
>>
>>  The short answer is "no". Not yet, anyway.
>>>
>>> But, here are some things that might help. First, if dictionary loading
>>> times are long, you can use the command line tool supplied in the package
>>> to
>>> compile the dictionary, and use the compiled dictionary. If you do this,
>>> remember that you will need to change the AE descriptors to use the
>>> correct
>>> implementation of the dictionary loader, e.g.:
>>>
>>> <externalResource>
>>>      ...
>>>
>>>
>>> <implementationName>org.apache.uima.conceptMapper.support.dictionaryResource.CompiledDictionaryResource_impl</implementationName>
>>>      ...
>>> </externalResource>
>>>
>>> That said, if you are using 13 dictionaries, that means you are running
>>> 13
>>> copies of ConceptMapper in your pipeline, which means that you are
>>> traversing each file's text  at 13 times just for your ConceptMapper
>>> invocations. If you could merge the dictionaries into one, you should see
>>> a
>>> marked speedup. Clearly, it a near-term enhancement of ConceptMapper
>>> would
>>> be to enable the loading of multiple dictionaries, which get merged at
>>> initialization time.
>>>
>>> One side note: I am going to be on vacation starting on June 25 and will
>>> only have occasional access to email until I return on July 12. I will
>>> try
>>> to answer questions during that time when I do have access, but I really
>>> have no idea how often that will be.
>>>
>>>
>>>
>>> On Jun 23, 2008, at 2:19 PM, Ahmed Abdeen Hamed wrote:
>>>
>>> Hello UIMA members,I am using the document analyzer example to analyze
>>>
>>>> large
>>>> files from multiple dictionaries. One of the raw files is 7.5MB. The
>>>> number
>>>> of dictionaries is 13, 1MB is the size of each. Is there some sort of a
>>>> matrix that you can use to predict the execution time? Has any one
>>>> written
>>>> a
>>>> paper on the performance analysis of ConceptMapper?
>>>> Please let me know if you can.
>>>> Best wishes,
>>>> --------------------------------------------------------
>>>> Ahmed Abdeen Hamed
>>>> Scientific Informatics Project Leader
>>>> MBLWHOI Library
>>>> Marine Biological Laboratory
>>>> 7 MBL Street Woods Hole, MA 02543 USA
>>>> +1 508 289 7676
>>>> --
>>>> email: abdeen@mbl.edu
>>>> --
>>>>
>>>>
>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message