ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Vyas <jayunit100.apa...@gmail.com>
Subject Re: Announcement: UMLS MedGen-MySQL dataset now available as open access download
Date Tue, 11 Nov 2014 22:29:19 GMT
+1000 on this!  Great lets make a jira!!!

> On Nov 11, 2014, at 5:02 PM, andy mcmurry <mcmurry.andy@gmail.com> wrote:
> 
> Hello!
> 
> https://bitbucket.org/invitae/medgen-mysql (Apache Licensed ASL2)
> 
> We just released a new library containing a huge chunk of UMLS concepts
> which are available without registering accounts/username/passwords.
> LEGALLY. Yes, really!
> 
> The subset is from NCBI and it contains *thousands of concepts from SNOMED
> and other vocabularies*.
> 
> The code is essentially
> 1. a list of WGET targets to various NCBI FTP site mirrors
> 2. Makefile for building the databases of interest
> 
> Our legal team has approved distribution for Open Access work, ASL2
> LICENSE.
> 
> I recommend we use this opportunity to make this the default distribution
> for CTAKES UMLS connections, because it obviates the need for so much
> painful credentialing and back and forth agreements with the US National
> Library of Medicine.
> 
> Cheers!
> --Andy
> 
> 
> On Wed, Sep 10, 2014 at 12:13 PM, Masanz, James J. <Masanz.James@mayo.edu>
> wrote:
> 
>> 
>> I would love to see the install be as simple as apt-get install to end up
>> with some working dictionary that have more than a handful of entries to
>> get them started.
>> 
>> Regards,
>> James Masanz
>> 
>> -----Original Message-----
>> From: andy mcmurry [mailto:mcmurry.andy@gmail.com]
>> Sent: Tuesday, September 09, 2014 4:32 PM
>> To: ctakes-dev@incubator.apache.org
>> Subject: Recommendation for ctakes default (UMLS) dictionaries
>> 
>> Greetings ctakes-dev:
>> 
>> *UMLS license restrictions have been getting more lax over the years --
>> *much of the UMLS can be downloaded directly from the NCBI official FTP
>> site.
>> 
>> In fact, the NIH (and implicitly the NLM) *have already made the standard
>> terms public for some medical specialities*.
>> 
>> For example: Here is the UMLS subset specific to Medical Genetics (MedGen)
>> and Genetic Testing (GTR) complete with SNOMED-CT concept CUI(s) and names,
>> etc :
>> 
>> [  ftp://ftp.ncbi.nlm.nih.gov/pub/medgen/README.html  ]
>> 
>> My team has developed a JVM based wrapper for MetaMap 2013AB which I
>> intend to open source soon (Clojure).  It includes REST support for
>> invoking MetaMap with any or all of the command line arguments.
>> We do not integrate with UIMA, we are basically a wrapper around the
>> binary installation of MetaMap. The emphasis is on publication text not
>> clinical text, still, some services are common (such as LVG).
>> 
>> Strangely, the NLM still requires UMLS licenses to download MetaMap
>> execution binaries. The MetaMap binary install is better but customizing
>> dictionaries (DataFileBuilder) is not as easy to use as CTAKES with YTEXT
>> 
>> [ https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installation ]
>> 
>> *** Hence, there is a real opportunity here to enable Apache cTAKES to
>> have a stronger default dictionary. ** *
>> 
>> Imagine if we could
>> *$ apt-get install apache-ctakes *
>> 
>> and instantly have a working package for SOME problem domain.
>> In my case (Medical Genetics) the UMLS definitions are already available
>> and the UMLS license problem becomes a non issue, at least for many first
>> time users
>> 
>> Your thoughts?
>> AndyMC
>> 

Mime
View raw message