Return-Path: X-Original-To: apmail-ctakes-user-archive@www.apache.org Delivered-To: apmail-ctakes-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7F689FB56 for ; Mon, 29 Apr 2013 20:00:06 +0000 (UTC) Received: (qmail 47213 invoked by uid 500); 29 Apr 2013 20:00:06 -0000 Delivered-To: apmail-ctakes-user-archive@ctakes.apache.org Received: (qmail 47174 invoked by uid 500); 29 Apr 2013 20:00:06 -0000 Mailing-List: contact user-help@ctakes.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@ctakes.apache.org Delivered-To: mailing list user@ctakes.apache.org Received: (qmail 47167 invoked by uid 99); 29 Apr 2013 20:00:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Apr 2013 20:00:06 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_FRT_LITTLE X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of kannanth@gmail.com designates 209.85.212.47 as permitted sender) Received: from [209.85.212.47] (HELO mail-vb0-f47.google.com) (209.85.212.47) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Apr 2013 20:00:01 +0000 Received: by mail-vb0-f47.google.com with SMTP id x14so929428vbb.20 for ; Mon, 29 Apr 2013 12:59:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=YCBUo52x/7bpqy/xQG5zrKe2Fj1qsOTk55V/ujsKS2U=; b=RYZAOmPmdKyAeG1uxKlcPaffllA3XwKtThe6ixi9Dt9wWBVbLJ6ggVp6RTLzI7gA88 biDXY5STUNDAZumw844koYHaEXzBQBOQAaan0/Hvk3rdEf+OZZqFBpzHSZefHO6/rifd zKzZhi9rs5sYHw9tJsqNbu/WvDxwuUd8fTrzLRvwVRiySuLcKsWRaQHDv5NSmAdOzUxZ QcVcS3ON1Dhv/Ffu8KRPaBGElLdgtZPZ9etXAnaKfmI46rpBSDkjbn4yTpsJgWzSNp1E VJssZvakYjAtx2MI7cFK9sbVX976OegKADafKgTScknbXNMY8GsMpbcdaDOAp4o7F04T y15g== MIME-Version: 1.0 X-Received: by 10.221.9.136 with SMTP id ow8mr33516163vcb.58.1367265581079; Mon, 29 Apr 2013 12:59:41 -0700 (PDT) Received: by 10.58.43.232 with HTTP; Mon, 29 Apr 2013 12:59:40 -0700 (PDT) In-Reply-To: <38DD5510DD255845934ABD523985843B0DAD7E@MSGPEXCEI11A.mfad.mfroot.org> References: <38DD5510DD255845934ABD523985843B0DAD7E@MSGPEXCEI11A.mfad.mfroot.org> Date: Mon, 29 Apr 2013 14:59:40 -0500 Message-ID: Subject: Re: DrugAggregateUMLSPlainTextProcessor related question From: Kannan Thiagarajan To: user@ctakes.apache.org Content-Type: multipart/alternative; boundary=bcaec54a383e462d0104db855313 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec54a383e462d0104db855313 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hello Sean, Thanks for the response. Just for my own understanding, do you know how many permutations its currently limited to and where I might see that in the code Best Regards Kannan On Mon, Apr 29, 2013 at 9:04 AM, Murphy, Sean P. [RO BIT] < Murphy.Sean@mayo.edu> wrote: > Hello Kannan,**** > > The issue is mainly due to how cTAKES is handling > permutations. The overhead required to handle, say 7 or more > permutations, was not found to have a good return even if there was a > corresponding RXCONSO entry.**** > > Additionally, unless the text extracted represented the > normalized form, according to Rxnorm, the resulting named entity would be > missed.**** > > ** ** > > So for the example below, if Lexapro had a corresponding > entry for =91Lexapro 10 MG=92 than the pipeline would have discovered the > entity.**** > > Thanks,**** > > ~Sean**** > > **** > > ** ** > > *From:* user-return-173-Murphy.Sean=3Dmayo.edu@ctakes.apache.org [mailto: > user-return-173-Murphy.Sean=3Dmayo.edu@ctakes.apache.org] *On Behalf Of *= Kannan > Thiagarajan > *Sent:* Monday, April 29, 2013 7:52 AM > *To:* user@ctakes.apache.org > *Subject:* DrugAggregateUMLSPlainTextProcessor related question**** > > ** ** > > Hello, **** > > ** ** > > I'm trying to understand the named entity recognition aspect of cTAKES.**= * > * > > ** ** > > If I pass-in a text such as below**** > > ** ** > > *Lexapro 10 mg oral tablet 3 times a day***** > > ** ** > > cTAKES finds a single MedicationEventMention with the RxNorm code =3D > 352741. However looking in the RXCONSO database, I see that there is one > specific entry for the 10 mg. **** > > ** ** > > 352741|ENG||||||1551887|1551887|352741||RXNORM|BN|352741|Lexapro||N|4096|= * > *** > > 352272|ENG||||||1937400|1937400|352272||RXNORM|SY|352272|Lexapro 10 MG > Oral Tablet||N|4096|**** > > ** ** > > But, cTAKES always resorts to finding the first entry (without 10 mg). **= * > * > > ** ** > > I did however notice that in certain cases it finds two annotations. For > example**** > > ** ** > > *Aspirin 325 mg two times a day* > **** > > ** ** > > Comes up with two annotations - Asprin 325 mg (code 317300) and Aspirin > (code 1191)**** > > ** ** > > 317300|ENG||||||1481682|1481682|317300||RXNORM|SCDC|317300|Aspirin 325 > MG||N|4096|**** > > 1191|ENG||||||2596464|2596464|1191||MTHSPL|SU|R16CO5Y76E|Aspirin||N|4096|= * > *** > > ** ** > > Any thoughts as to why there might be a difference in the lookup. **** > > ** ** > > ** ** > > Thanks**** > > ** ** > > -- > Best Regards**** > > Kannan Thiagarajan**** > > ** ** > --=20 Best Regards Kannan Thiagarajan --bcaec54a383e462d0104db855313 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable
Hello Sean,

Thanks for the response.=A0

Just for my own understanding, do yo= u know how many permutations its currently limited to and where I might see= that in the code

Best Regard= s
Kannan

= On Mon, Apr 29, 2013 at 9:04 AM, Murphy, Sean P. [RO BIT] <Murphy.Sean@= mayo.edu> wrote:

Hello Kannan,

=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0 The issue is mainly due to how cTAKES is handling per= mutations.=A0 =A0=A0The overhead required to handle, say 7 or more permutat= ions, was not found to have a good return even if there was a corresponding RXCONSO entry.<= /u>

=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0 Additionally, unless the text extracted represented t= he normalized form, according to Rxnorm, the resulting named entity would b= e missed.

=A0<= /p>

=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0 So for the example below, if Lexapro had a correspond= ing entry for =91Lexapro 10 MG=92 than the pipeline would have discovered t= he entity.

=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0 Thanks,

=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 ~Sean=

=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0

=A0<= /p>

From: user-ret= urn-173-Murphy.Sean=3Dmayo.edu@ctakes.apache.org [mailto:user-return-173-Murphy.Sean=3D= mayo.edu@ct= akes.apache.org] On Behalf Of Kannan Thiagarajan
Sent: Monday, April 29, 2013 7:52 AM
To: user= @ctakes.apache.org
Subject: DrugAggregateUMLSPlainTextProcessor related question=

=A0

Hello,=A0

=A0

I'm trying to understand the named entity recogn= ition aspect of cTAKES.

=A0

If I pass-in a text such as below

=A0

Lexapro 10 mg oral tablet 3 times a day

=A0

cTAKES finds a single MedicationEventMention with th= e RxNorm code =3D 352741. =A0However looking in the RXCONSO database, I see= that there is one specific entry for the 10 mg.=A0

=A0

352741|ENG||||||1551887|1551887|352741||RXNORM|BN|35= 2741|Lexapro||N|4096|

352272|ENG||||||1937400|1937400|352272||RXNORM|SY|35= 2272|Lexapro 10 MG Oral Tablet||N|4096|

=A0

But, cTAKES always resorts to finding the first entr= y (without 10 mg).=A0

=A0

I did however notice that in certain cases it finds = two annotations. For example

=A0

Aspirin 325 mg two times a day

=A0

Comes up with two annotations - Asprin 325 mg (code = 317300) and Aspirin (code 1191)

=A0

317300|ENG||||||1481682|1481682|317300||RXNORM|SCDC|= 317300|Aspirin 325 MG||N|4096|

1191|ENG||||||2596464|2596464|1191||MTHSPL|SU|R16CO5= Y76E|Aspirin||N|4096|

=A0

Any thoughts as to why there might be a difference i= n the lookup.=A0

=A0

=A0

Thanks

=A0

--
Best Regards

Kannan Thiagarajan

=A0




--
Best Regards=
Kannan Thiagarajan

--bcaec54a383e462d0104db855313--