ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (3980)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: building a *real sample dictionary* without UMLS login
Date Fri, 02 Oct 2015 14:02:49 GMT
Hi,

I would be extremely interested in a sample dictionary that
doesn’t require a UMLS login.

How would I use this?

Thanks,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++





-----Original Message-----
From: "AndyMC@apache.org (forwarding)" <mcmurry.andy@gmail.com>
Reply-To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
Date: Friday, October 2, 2015 at 12:43 AM
To: "dev@ctakes.apache.org" <dev@ctakes.apache.org>
Subject: building a *real sample dictionary* without UMLS login

>Greetings ctakes-dev!
>
>I have been polishing MedGen (UMLS) dictionaries for over a year now and
>I am confident in saying "this is solid".
>As a reminder, the medgen-mysql package contains a large subset of the
>UMLS that can be downloaded without UMLS login, greatly simplifying the
>creation of an example dictionary.
>
>QUESTION: 
>Would you like me to integrate this into ctakes to simplify installations
>for new-users, and if so, what would be your preferred method?
>
>Source Vocabularies (SAB)
>+-------------+--------+
>| SourceVocab | cnt    |
>+-------------+--------+
>| MSH         | 245435 | Medical Subject Headings
>| SNOMEDCT_US | 156105 | SNOMED Clinical Terms
>| NCI         | 136888 | NCI Cancer Terms
>| ...         |  ...   |
>+-------------+--------+
>
>Semantic Types (STY)
>+-------------------------------------------+--------+
>| SemanticType                              | cnt    |
>+-------------------------------------------+--------+
>| Pharmacologic Substance                   | 102511 |
>| Finding                                   |  90413 |
>| Organic Chemical                          |  81329 |
>| Disease or Syndrome                       |  47223 |
>| Neoplastic Process                        |  16151 |
>| Amino Acid, Peptide, or Protein           |   9383 |
>| Congenital Abnormality                    |   6536 |
>| Pathologic Function                       |   5655 |
>| Steroid                                   |   3919 |
>| Sign or Symptom                           |   2909 |
>| ...                                       |   ...  |
>
>
>What would you like to see?
>AndyMC@apache.org	
>
>
>On Nov 12, 2014, at 6:14 AM, "Dligach, Dmitriy"
><Dmitriy.Dligach@childrens.harvard.edu> wrote:
>
>> Andy, thank you for this resource!
>> 
>> Do you have an estimate of what percentage of UMLS concepts were left
>>out?
>> 
>> Dima
>> 
>> 
>> 
>> 
>> On Nov 11, 2014, at 16:02, andy mcmurry <mcmurry.andy@gmail.com> wrote:
>> 
>>> Hello!
>>> 
>>> https://bitbucket.org/invitae/medgen-mysql (Apache Licensed ASL2)
>>> 
>>> We just released a new library containing a huge chunk of UMLS concepts
>>> which are available without registering accounts/username/passwords.
>>> LEGALLY. Yes, really!
>>> 
>>> The subset is from NCBI and it contains *thousands of concepts from
>>>SNOMED
>>> and other vocabularies*.
>>> 
>>> The code is essentially
>>> 1. a list of WGET targets to various NCBI FTP site mirrors
>>> 2. Makefile for building the databases of interest
>>> 
>>> Our legal team has approved distribution for Open Access work, ASL2
>>> LICENSE.
>>> 
>>> I recommend we use this opportunity to make this the default
>>>distribution
>>> for CTAKES UMLS connections, because it obviates the need for so much
>>> painful credentialing and back and forth agreements with the US
>>>National
>>> Library of Medicine.
>>> 
>>> Cheers!
>>> --Andy
>>> 
>>> 
>>> On Wed, Sep 10, 2014 at 12:13 PM, Masanz, James J.
>>><Masanz.James@mayo.edu>
>>> wrote:
>>> 
>>>> 
>>>> I would love to see the install be as simple as apt-get install to
>>>>end up
>>>> with some working dictionary that have more than a handful of entries
>>>>to
>>>> get them started.
>>>> 
>>>> Regards,
>>>> James Masanz
>>>> 
>>>> -----Original Message-----
>>>> From: andy mcmurry [mailto:mcmurry.andy@gmail.com]
>>>> Sent: Tuesday, September 09, 2014 4:32 PM
>>>> To: ctakes-dev@incubator.apache.org
>>>> Subject: Recommendation for ctakes default (UMLS) dictionaries
>>>> 
>>>> Greetings ctakes-dev:
>>>> 
>>>> *UMLS license restrictions have been getting more lax over the years
>>>>--
>>>> *much of the UMLS can be downloaded directly from the NCBI official
>>>>FTP
>>>> site.
>>>> 
>>>> In fact, the NIH (and implicitly the NLM) *have already made the
>>>>standard
>>>> terms public for some medical specialities*.
>>>> 
>>>> For example: Here is the UMLS subset specific to Medical Genetics
>>>>(MedGen)
>>>> and Genetic Testing (GTR) complete with SNOMED-CT concept CUI(s) and
>>>>names,
>>>> etc :
>>>> 
>>>> [  ftp://ftp.ncbi.nlm.nih.gov/pub/medgen/README.html  ]
>>>> 
>>>> My team has developed a JVM based wrapper for MetaMap 2013AB which I
>>>> intend to open source soon (Clojure).  It includes REST support for
>>>> invoking MetaMap with any or all of the command line arguments.
>>>> We do not integrate with UIMA, we are basically a wrapper around the
>>>> binary installation of MetaMap. The emphasis is on publication text
>>>>not
>>>> clinical text, still, some services are common (such as LVG).
>>>> 
>>>> Strangely, the NLM still requires UMLS licenses to download MetaMap
>>>> execution binaries. The MetaMap binary install is better but
>>>>customizing
>>>> dictionaries (DataFileBuilder) is not as easy to use as CTAKES with
>>>>YTEXT
>>>> 
>>>> [ 
>>>>https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation ]
>>>> 
>>>> *** Hence, there is a real opportunity here to enable Apache cTAKES to
>>>> have a stronger default dictionary. ** *
>>>> 
>>>> Imagine if we could
>>>> *$ apt-get install apache-ctakes *
>>>> 
>>>> and instantly have a working package for SOME problem domain.
>>>> In my case (Medical Genetics) the UMLS definitions are already
>>>>available
>>>> and the UMLS license problem becomes a non issue, at least for many
>>>>first
>>>> time users
>>>> 
>>>> Your thoughts?
>>>> AndyMC
>>>> 
>> 
>

Mime
View raw message