ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maite Meseure Hugues <meseure.ma...@gmail.com>
Subject Re: Questions about dictionary-lookup and dictionary-lookup-fast
Date Tue, 10 Mar 2015 17:00:26 GMT
Thank you Sean for your complete reply, it's helpful.

On Tue, Mar 10, 2015 at 11:53 AM, Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Maite,
>
> > Does anyone know why is it [UmlsDictionaryLookupAnnotator ]so slow?
> The top 5 reasons (1-3 are 90% of the problem):
> 1.  The dictionary database is bloated with unwanted entries
> 2.  The dictionary database indexing is sub-optimal
> 3.  The second drug lookup with orangebook filtering takes extra time
> 4.  The matching algorithm does a little more work than is necessary
> 5.  There is some redundancy
>
> > my interest is to be able to create my own HsqlDb-based dictionary
> If you want to build a database using a subset of UMLS, check out the
> Dictionary Tool in Sandbox.  It can build custom hsqldb dictionaries in
> both the new (-fast) and old format using sources, tuis, filters, etc. that
> you specify in plaintext parameter files.  Several types of default setups
> are already available.  It is fully functional, but it has been a
> work-in-progress during my off-hours, so functionality changes and
> documentation is lacking, but there is a howto.txt  in the
> dictionarytool/doc/ directory.
>
> *NOTE: if your custom dictionaries are small (~1000 entries?) then it
> would probably be easier to just throw them into a bar-separated value
> (bsv) file.  There are examples in the dictionary-fast-res example/bsv/
> directory.
>
> Sean
>
> -----Original Message-----
> From: Maite Meseure Hugues [mailto:meseure.maite@gmail.com]
> Sent: Tuesday, March 10, 2015 12:35 PM
> To: dev@ctakes.apache.org
> Subject: Questions about dictionary-lookup and dictionary-lookup-fast
>
> Hi everyone,
>
> 1) I am currently working on BagOfCuisGenerator.java with the analysis
> engine 'AggregatePlaintextUMLSProcessor.xml', but that process is very slow
> at that step:
>
> INFO UmlsDictionaryLookupAnnotator - process(JCas)
>
> Does anyone know why is it so slow?
>
> 2) I also tried with 'AggregatePlaintextFastUMLSProcessor.xml' and it's
> actually pretty fast like his name suggests, but my interest is to be able
> to create my own HsqlDb-based dictionary like we can do with a Lucene index
> and integrate it in the process, is it possible with the fast version? Do
> you have any pointers that could allow me to do that?
>
> Thank you very much for you time.
>
> --
> --
>  Maïté Meseure Hugues
>



-- 
--
 Maïté Meseure Hugues

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message