ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Savova, Guergana" <Guergana.Sav...@childrens.harvard.edu>
Subject RE: Announcement: UMLS MedGen-MySQL dataset now available as open access download
Date Tue, 11 Nov 2014 22:32:49 GMT
This is great!!!! Thank you so much, Andy!!!
I agree that it will make life for many users MUCH easier.
--guergana

-----Original Message-----
From: Jay Vyas [mailto:jayunit100.apache@gmail.com] 
Sent: Tuesday, November 11, 2014 5:31 PM
To: dev@ctakes.apache.org
Subject: Re: Announcement: UMLS MedGen-MySQL dataset now available as open access download

+1000 on this!  Great lets make a jira!!!

> On Nov 11, 2014, at 5:02 PM, andy mcmurry <mcmurry.andy@gmail.com> wrote:
> 
> Hello!
> 
> https://bitbucket.org/invitae/medgen-mysql (Apache Licensed ASL2)
> 
> We just released a new library containing a huge chunk of UMLS 
> concepts which are available without registering accounts/username/passwords.
> LEGALLY. Yes, really!
> 
> The subset is from NCBI and it contains *thousands of concepts from 
> SNOMED and other vocabularies*.
> 
> The code is essentially
> 1. a list of WGET targets to various NCBI FTP site mirrors 2. Makefile 
> for building the databases of interest
> 
> Our legal team has approved distribution for Open Access work, ASL2 
> LICENSE.
> 
> I recommend we use this opportunity to make this the default 
> distribution for CTAKES UMLS connections, because it obviates the need 
> for so much painful credentialing and back and forth agreements with 
> the US National Library of Medicine.
> 
> Cheers!
> --Andy
> 
> 
> On Wed, Sep 10, 2014 at 12:13 PM, Masanz, James J. 
> <Masanz.James@mayo.edu>
> wrote:
> 
>> 
>> I would love to see the install be as simple as apt-get install to 
>> end up with some working dictionary that have more than a handful of 
>> entries to get them started.
>> 
>> Regards,
>> James Masanz
>> 
>> -----Original Message-----
>> From: andy mcmurry [mailto:mcmurry.andy@gmail.com]
>> Sent: Tuesday, September 09, 2014 4:32 PM
>> To: ctakes-dev@incubator.apache.org
>> Subject: Recommendation for ctakes default (UMLS) dictionaries
>> 
>> Greetings ctakes-dev:
>> 
>> *UMLS license restrictions have been getting more lax over the years 
>> -- *much of the UMLS can be downloaded directly from the NCBI 
>> official FTP site.
>> 
>> In fact, the NIH (and implicitly the NLM) *have already made the 
>> standard terms public for some medical specialities*.
>> 
>> For example: Here is the UMLS subset specific to Medical Genetics 
>> (MedGen) and Genetic Testing (GTR) complete with SNOMED-CT concept 
>> CUI(s) and names, etc :
>> 
>> [  ftp://ftp.ncbi.nlm.nih.gov/pub/medgen/README.html  ]
>> 
>> My team has developed a JVM based wrapper for MetaMap 2013AB which I 
>> intend to open source soon (Clojure).  It includes REST support for 
>> invoking MetaMap with any or all of the command line arguments.
>> We do not integrate with UIMA, we are basically a wrapper around the 
>> binary installation of MetaMap. The emphasis is on publication text 
>> not clinical text, still, some services are common (such as LVG).
>> 
>> Strangely, the NLM still requires UMLS licenses to download MetaMap 
>> execution binaries. The MetaMap binary install is better but 
>> customizing dictionaries (DataFileBuilder) is not as easy to use as 
>> CTAKES with YTEXT
>> 
>> [ 
>> https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation 
>> ]
>> 
>> *** Hence, there is a real opportunity here to enable Apache cTAKES 
>> to have a stronger default dictionary. ** *
>> 
>> Imagine if we could
>> *$ apt-get install apache-ctakes *
>> 
>> and instantly have a working package for SOME problem domain.
>> In my case (Medical Genetics) the UMLS definitions are already 
>> available and the UMLS license problem becomes a non issue, at least 
>> for many first time users
>> 
>> Your thoughts?
>> AndyMC
>> 

Mime
View raw message