ctakes-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Murphy, Sean P. [RO BIT]" <Murphy.S...@mayo.edu>
Subject RE: DrugAggregateUMLSPlainTextProcessor related question
Date Mon, 29 Apr 2013 20:11:26 GMT
cTAKES\resources\drugnerresources\lookup\LookupDesc_DrugNER.xml or similar has the setting
under the lookup bindings maxPermutationLevel (as it states here that value used is '3':

<lookupInitializer className="edu.mayo.bmi.uima.lookup.ae.FirstTokenPermLookupInitializerImpl">
key="textMetaFields" value="preferred_designation|other_designation"/>
key="maxPermutationLevel" value="3"/>
key="windowAnnotations" value="edu.mayo.bmi.uima.lookup.type.DrugLookupWindowAnnotation"/>
key="exclusionTags" value="CC,CD,DT,EX,LS,MD,PDT,POS,PP,PP$,PRP,PRP$,RP,TO,WDT,WP,WPS,WRB"/>
<!-- ohnlp ID# 3378705 -->

From: user-return-175-Murphy.Sean=mayo.edu@ctakes.apache.org [mailto:user-return-175-Murphy.Sean=mayo.edu@ctakes.apache.org]
On Behalf Of Kannan Thiagarajan
Sent: Monday, April 29, 2013 3:00 PM
To: user@ctakes.apache.org
Subject: Re: DrugAggregateUMLSPlainTextProcessor related question

Hello Sean,

Thanks for the response.

Just for my own understanding, do you know how many permutations its currently limited to
and where I might see that in the code

Best Regards
On Mon, Apr 29, 2013 at 9:04 AM, Murphy, Sean P. [RO BIT] <Murphy.Sean@mayo.edu<mailto:Murphy.Sean@mayo.edu>>
Hello Kannan,
                The issue is mainly due to how cTAKES is handling permutations.    The overhead
required to handle, say 7 or more permutations, was not found to have a good return even if
there was a corresponding RXCONSO entry.
                Additionally, unless the text extracted represented the normalized form, according
to Rxnorm, the resulting named entity would be missed.

                So for the example below, if Lexapro had a corresponding entry for 'Lexapro
10 MG' than the pipeline would have discovered the entity.

From: user-return-173-Murphy.Sean=mayo.edu@ctakes.apache.org<mailto:mayo.edu@ctakes.apache.org>
On Behalf Of Kannan Thiagarajan
Sent: Monday, April 29, 2013 7:52 AM
To: user@ctakes.apache.org<mailto:user@ctakes.apache.org>
Subject: DrugAggregateUMLSPlainTextProcessor related question


I'm trying to understand the named entity recognition aspect of cTAKES.

If I pass-in a text such as below

Lexapro 10 mg oral tablet 3 times a day

cTAKES finds a single MedicationEventMention with the RxNorm code = 352741.  However looking
in the RXCONSO database, I see that there is one specific entry for the 10 mg.

352272|ENG||||||1937400|1937400|352272||RXNORM|SY|352272|Lexapro 10 MG Oral Tablet||N|4096|

But, cTAKES always resorts to finding the first entry (without 10 mg).

I did however notice that in certain cases it finds two annotations. For example

Aspirin 325 mg two times a day

Comes up with two annotations - Asprin 325 mg (code 317300) and Aspirin (code 1191)

317300|ENG||||||1481682|1481682|317300||RXNORM|SCDC|317300|Aspirin 325 MG||N|4096|

Any thoughts as to why there might be a difference in the lookup.


Best Regards
Kannan Thiagarajan

Best Regards
Kannan Thiagarajan

View raw message