ctakes-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Savova, Guergana" <Guergana.Sav...@childrens.harvard.edu>
Subject RE: Extracting Symptoms
Date Tue, 06 Aug 2013 17:10:15 GMT
Pruning by the UMLS semantic type is a very good idea. In some of our studies we have found
that the semantic type of Finding is quite noisy and we have discarded it (this should be
very easy to do in cTAKES).

The UMLS semantic types that define UMLS semantic groups such as Disorders are in Table 1
of this manuscript: http://semanticnetwork.nlm.nih.gov/SemGroups/Papers/2003-medinfo-atm.pdf.
You can use that table as a rough guide to which semantic types to include in your study.
Anyway, the modification to cTAKES is really minimal (you have to specify the sem types in
an XML file).

Hope this helps!

From: Tim Miller [mailto:timothy.miller@childrens.harvard.edu]
Sent: Tuesday, August 06, 2013 12:16 PM
To: user@ctakes.apache.org
Subject: Re: Extracting Symptoms

I don't know of anyone that's done exactly what you're asking, but I think it's a really interesting
idea. My first thought was that you could try the Finding typeID which would be one level
less granular the TUIs. But that covers many more TUIs:

that contains T184, but also the noisier T033 and T047, along with many others! So that would
make your problem worse.

Unfortunately it sounds like from what you're saying that the UMLS doesn't have the granularity
in the places that you need to represent only the findings that you're interested in.

Are there any examples of the types of things that come up from T033 and T047 that you aren't
interested in? I'm wondering if there's a pattern that you may be able to write rules to find
so that you can over-generate and then filter with those rules. Just throwing out a simple


Do you think if you moved to one level more abstract you would get too much?
On 08/06/2013 11:47 AM, Bohne, Jacqueline R wrote:
We are trying to create a cTAKES process that will extract all symptoms from our documents.
 In our first attempt, we used the UMLS dictionary and pulled anything with a TUI of T184
(Sign or Symptom).  While this worked, we found that when we compared it to what our Research
Coordinators manually abstracted as symptoms, there were quite a few differences.  When we
looked into these differences we found a lot of the extra terms were considered either Findings
(T033) or Disease or Syndrome (T047) in UMLS.  We would rather not just add these TUIs to
our NLP process because then we would end up with many more terms than just symptoms in our

Has anyone else tried to create a database of symptoms using NLP?  Or are you aware of a better
solution for creating a symptoms database?

Thank you for your time!

Jacquie Bohne
Research Programmer/Analyst
Marshfield Clinic
The contents of this message may contain private, protected and/or privileged information.
If you received this message in error, you should destroy the e-mail message and any attachments
or copies, and you are prohibited from retaining, distributing, disclosing or using any information
contained within. Please contact the sender and advise of the erroneous delivery by return
e-mail or telephone. Thank you for your cooperation.

View raw message