Yes, CompileDictionary.java will do it. But if dictionary loading time
is not the problem, I wouldn't bother doing that as it will not buy
you much. Combining the dictionaries, for now, should make the biggest
difference.
On Jun 23, 2008, at 3:02 PM, Ahmed Abdeen Hamed wrote:
> Thanks Michael. Dictionaries processing time is reasonable. It's the
> document analyzer execution time that is the bottleneck. I will
> merge the
> dictionaries and compile them as you suggested. However, I am not
> sure which
> command line tool you are referring to. Do you mean:
> org
> .apache.uima.conceptMapper.dictionaryCompiler.CompileDictionary.java?
> Thanks for the vacation heads up.
> Ahmed
>
> On Mon, Jun 23, 2008 at 2:37 PM, Michael Tanenblatt <slothrop@park-slope.net
> >
> wrote:
>
>> The short answer is "no". Not yet, anyway.
>>
>> But, here are some things that might help. First, if dictionary
>> loading
>> times are long, you can use the command line tool supplied in the
>> package to
>> compile the dictionary, and use the compiled dictionary. If you do
>> this,
>> remember that you will need to change the AE descriptors to use the
>> correct
>> implementation of the dictionary loader, e.g.:
>>
>> <externalResource>
>> ...
>>
>> <
>> implementationName
>> >
>> org
>> .apache
>> .uima
>> .conceptMapper
>> .support.dictionaryResource.CompiledDictionaryResource_impl</
>> implementationName>
>> ...
>> </externalResource>
>>
>> That said, if you are using 13 dictionaries, that means you are
>> running 13
>> copies of ConceptMapper in your pipeline, which means that you are
>> traversing each file's text at 13 times just for your ConceptMapper
>> invocations. If you could merge the dictionaries into one, you
>> should see a
>> marked speedup. Clearly, it a near-term enhancement of
>> ConceptMapper would
>> be to enable the loading of multiple dictionaries, which get merged
>> at
>> initialization time.
>>
>> One side note: I am going to be on vacation starting on June 25 and
>> will
>> only have occasional access to email until I return on July 12. I
>> will try
>> to answer questions during that time when I do have access, but I
>> really
>> have no idea how often that will be.
>>
>>
>>
>> On Jun 23, 2008, at 2:19 PM, Ahmed Abdeen Hamed wrote:
>>
>> Hello UIMA members,I am using the document analyzer example to
>> analyze
>>> large
>>> files from multiple dictionaries. One of the raw files is 7.5MB. The
>>> number
>>> of dictionaries is 13, 1MB is the size of each. Is there some sort
>>> of a
>>> matrix that you can use to predict the execution time? Has any one
>>> written
>>> a
>>> paper on the performance analysis of ConceptMapper?
>>> Please let me know if you can.
>>> Best wishes,
>>> --------------------------------------------------------
>>> Ahmed Abdeen Hamed
>>> Scientific Informatics Project Leader
>>> MBLWHOI Library
>>> Marine Biological Laboratory
>>> 7 MBL Street Woods Hole, MA 02543 USA
>>> +1 508 289 7676
>>> --
>>> email: abdeen@mbl.edu
>>> --
>>>
>>
|