ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From britt fitch <britt.fi...@wiredinformatics.com>
Subject Re: The fast dictionary pipeline vs. the regular one
Date Mon, 22 Jun 2015 13:23:13 GMT
Regarding the miss on “cm” in #2, you might want to check out the dictionary xml descriptor
or uimafit wiring, depending on which you are using, for the parameter “minimumSpan”.
If I recall correctly the default minimum span is 3 characters, however you can reduce it
to 2 if desired.

Cheers,

Britt


Britt Fitch
Wired Informatics
265 Franklin St Ste 1702
Boston, MA 02110
http://wiredinformatics.com
Britt.Fitch@wiredinformatics.com

> On Jun 21, 2015, at 2:45 PM, Miller, Timothy <Timothy.Miller@childrens.harvard.edu>
wrote:
> 
> Sean wrote the fast version and may be able to answer your specific questions. But in
general, the fast dictionary does not match performance exactly -- it is not implementing
an equivalent search and it has different indexing methods. We are happy to receive reports
of what seem like bugs, though, any new software is likely to have some. What I will say is
that I know Sean has run some (as yet unpublished) experiments and we believe that in the
aggregate the new system output is at least as high quality as the older one.
> Tim
> 
> 
> ________________________________________
> From: Oranit Dror [oranit@algotec.co.il]
> Sent: Sunday, June 21, 2015 4:37 AM
> To: dev@ctakes.apache.org
> Subject: The fast dictionary pipeline vs. the regular one
> 
> Hello,
> 
> I am using ctakes 3.2.2 with the regular pipeline. Recently, I have tested the fast dictionary
pipeline and indeed it is much faster.
> However, I have encountered with several quality differences in the returned annotations.
For example:
> 
> 
> 1.       With the fast pipeline, the term "GBM" is annotated as "glioblastoma multiforme",
while in the regular pipeline it is annotated as "glioblastoma".
> Note that according to the UMLS DB, the concept of "GBM" is "glioblastoma" and "glioblastoma
multiforme" is mapped to a narrower concept.
> 
> 
> 2.       The word "cm" in a phrase like "5.5 cm X 2.6 cm" is annotated by the regular
pipeline as "Cutaneous Mastocytosis", while in the fast pipeline it is  not annotated as a
medical term (as expected and as in UMLS).
> 
> 
> Any explanation for the differences?
> 
> Thank you,
> Oranit.
> 
> 
> 


Mime
View raw message