ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Pei" <Pei.C...@childrens.harvard.edu>
Subject RE: Announcement: UMLS MedGen-MySQL dataset now available as open access download
Date Thu, 13 Nov 2014 19:18:39 GMT
John- I believe that was the thinking.
Andy- Just to confirm- Is the raw content of this dataset released under ASL2.0?  i.e. can
you contribute it as a CSV or similar so that cTAKES may re-tokenize it using the same PTB
rules, format it for cTAKES' dictionary lookup, etc., and then redistribute it under the same
License.

> -----Original Message-----
> From: John Green [mailto:john.travis.green@gmail.com]
> Sent: Thursday, November 13, 2014 1:55 PM
> To: dev@ctakes.apache.org
> Cc: dev@ctakes.apache.org
> Subject: Re: Announcement: UMLS MedGen-MySQL dataset now available
> as open access download
> 
> The old licensed setup would be kept as a packaged option? Much as it is
> now.... With the unlicensed going out in place of the current "free"
> dictionary? Am I understanding that right?
> 
> 
> JG
> —
> Sent from Mailbox
> 
> On Thu, Nov 13, 2014 at 1:40 PM, andy mcmurry
> <mcmurry.andy@gmail.com>
> wrote:
> 
> > I'll crunch the numbers -- in the meantime I can tell you that
> > phenotypes vary by semantic type. clinical attributes  from SNOMED are
> > abundant, many concepts in mesh that are mapped to diseases. Tons of
> > "pharmacological substances"
> > On Nov 12, 2014 6:19 AM, "Dligach, Dmitriy" <
> > Dmitriy.Dligach@childrens.harvard.edu> wrote:
> >> Andy, thank you for this resource!
> >>
> >> Do you have an estimate of what percentage of UMLS concepts were left
> out?
> >>
> >> Dima
> >>
> >>
> >>
> >>
> >> On Nov 11, 2014, at 16:02, andy mcmurry <mcmurry.andy@gmail.com>
> wrote:
> >>
> >> > Hello!
> >> >
> >> > https://bitbucket.org/invitae/medgen-mysql (Apache Licensed ASL2)
> >> >
> >> > We just released a new library containing a huge chunk of UMLS
> >> > concepts which are available without registering
> accounts/username/passwords.
> >> > LEGALLY. Yes, really!
> >> >
> >> > The subset is from NCBI and it contains *thousands of concepts from
> >> SNOMED
> >> > and other vocabularies*.
> >> >
> >> > The code is essentially
> >> > 1. a list of WGET targets to various NCBI FTP site mirrors 2.
> >> > Makefile for building the databases of interest
> >> >
> >> > Our legal team has approved distribution for Open Access work, ASL2
> >> > LICENSE.
> >> >
> >> > I recommend we use this opportunity to make this the default
> >> > distribution for CTAKES UMLS connections, because it obviates the
> >> > need for so much painful credentialing and back and forth
> >> > agreements with the US National Library of Medicine.
> >> >
> >> > Cheers!
> >> > --Andy
> >> >
> >> >
> >> > On Wed, Sep 10, 2014 at 12:13 PM, Masanz, James J. <
> >> Masanz.James@mayo.edu>
> >> > wrote:
> >> >
> >> >>
> >> >> I would love to see the install be as simple as apt-get install to
> >> >> end
> >> up
> >> >> with some working dictionary that have more than a handful of
> >> >> entries to get them started.
> >> >>
> >> >> Regards,
> >> >> James Masanz
> >> >>
> >> >> -----Original Message-----
> >> >> From: andy mcmurry [mailto:mcmurry.andy@gmail.com]
> >> >> Sent: Tuesday, September 09, 2014 4:32 PM
> >> >> To: ctakes-dev@incubator.apache.org
> >> >> Subject: Recommendation for ctakes default (UMLS) dictionaries
> >> >>
> >> >> Greetings ctakes-dev:
> >> >>
> >> >> *UMLS license restrictions have been getting more lax over the
> >> >> years -- *much of the UMLS can be downloaded directly from the
> >> >> NCBI official FTP site.
> >> >>
> >> >> In fact, the NIH (and implicitly the NLM) *have already made the
> >> standard
> >> >> terms public for some medical specialities*.
> >> >>
> >> >> For example: Here is the UMLS subset specific to Medical Genetics
> >> (MedGen)
> >> >> and Genetic Testing (GTR) complete with SNOMED-CT concept CUI(s)
> >> >> and
> >> names,
> >> >> etc :
> >> >>
> >> >> [  ftp://ftp.ncbi.nlm.nih.gov/pub/medgen/README.html  ]
> >> >>
> >> >> My team has developed a JVM based wrapper for MetaMap 2013AB
> which
> >> >> I intend to open source soon (Clojure).  It includes REST support
> >> >> for invoking MetaMap with any or all of the command line arguments.
> >> >> We do not integrate with UIMA, we are basically a wrapper around
> >> >> the binary installation of MetaMap. The emphasis is on publication
> >> >> text not clinical text, still, some services are common (such as LVG).
> >> >>
> >> >> Strangely, the NLM still requires UMLS licenses to download
> >> >> MetaMap execution binaries. The MetaMap binary install is better
> >> >> but customizing dictionaries (DataFileBuilder) is not as easy to
> >> >> use as CTAKES with
> >> YTEXT
> >> >>
> >> >> [
> >> >> https://cwiki.apache.org/confluence/display/CTAKES/YTEX+Installati
> >> >> on
> >> ]
> >> >>
> >> >> *** Hence, there is a real opportunity here to enable Apache
> >> >> cTAKES to have a stronger default dictionary. ** *
> >> >>
> >> >> Imagine if we could
> >> >> *$ apt-get install apache-ctakes *
> >> >>
> >> >> and instantly have a working package for SOME problem domain.
> >> >> In my case (Medical Genetics) the UMLS definitions are already
> >> >> available and the UMLS license problem becomes a non issue, at
> >> >> least for many
> >> first
> >> >> time users
> >> >>
> >> >> Your thoughts?
> >> >> AndyMC
> >> >>
> >>
> >>
Mime
View raw message