uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ahmed Abdeen Hamed" <ahmed.elma...@gmail.com>
Subject Re: ConceptMapper: Performance Matrix
Date Mon, 23 Jun 2008 19:02:09 GMT
Thanks Michael. Dictionaries processing time is reasonable. It's the
document analyzer execution time that is the bottleneck. I will merge the
dictionaries and compile them as you suggested. However, I am not sure which
command line tool you are referring to. Do you mean:
org.apache.uima.conceptMapper.dictionaryCompiler.CompileDictionary.java?
Thanks for the vacation heads up.
Ahmed

On Mon, Jun 23, 2008 at 2:37 PM, Michael Tanenblatt <slothrop@park-slope.net>
wrote:

> The short answer is "no". Not yet, anyway.
>
> But, here are some things that might help. First, if dictionary loading
> times are long, you can use the command line tool supplied in the package to
> compile the dictionary, and use the compiled dictionary. If you do this,
> remember that you will need to change the AE descriptors to use the correct
> implementation of the dictionary loader, e.g.:
>
> <externalResource>
>        ...
>
>  <implementationName>org.apache.uima.conceptMapper.support.dictionaryResource.CompiledDictionaryResource_impl</implementationName>
>        ...
> </externalResource>
>
> That said, if you are using 13 dictionaries, that means you are running 13
> copies of ConceptMapper in your pipeline, which means that you are
> traversing each file's text  at 13 times just for your ConceptMapper
> invocations. If you could merge the dictionaries into one, you should see a
> marked speedup. Clearly, it a near-term enhancement of ConceptMapper would
> be to enable the loading of multiple dictionaries, which get merged at
> initialization time.
>
> One side note: I am going to be on vacation starting on June 25 and will
> only have occasional access to email until I return on July 12. I will try
> to answer questions during that time when I do have access, but I really
> have no idea how often that will be.
>
>
>
> On Jun 23, 2008, at 2:19 PM, Ahmed Abdeen Hamed wrote:
>
>  Hello UIMA members,I am using the document analyzer example to analyze
>> large
>> files from multiple dictionaries. One of the raw files is 7.5MB. The
>> number
>> of dictionaries is 13, 1MB is the size of each. Is there some sort of a
>> matrix that you can use to predict the execution time? Has any one written
>> a
>> paper on the performance analysis of ConceptMapper?
>> Please let me know if you can.
>> Best wishes,
>> --------------------------------------------------------
>> Ahmed Abdeen Hamed
>> Scientific Informatics Project Leader
>> MBLWHOI Library
>> Marine Biological Laboratory
>> 7 MBL Street Woods Hole, MA 02543 USA
>> +1 508 289 7676
>> --
>> email: abdeen@mbl.edu
>> --
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message