ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen, Pei" <Pei.C...@childrens.harvard.edu>
Subject RE: move ytex annotators to ctakes.apache.org?
Date Tue, 22 Oct 2013 21:29:00 GMT
Hi Vj,
Sorry if I misunderstood you before on how do we ship the non-compatible jars/libs-
Agreed, they'll probably have to reside somewhere else- maybe in its existing locations and
just pulled together by the installer/ant script(s) as optional libs as you suggested?
For the umls derived works, we can mimic the current umls bundled dictionaries.  They get
downloaded via maven central and/or sourceforge as ctakes-resources.zip as a separate download.
Open to ideas though...
--Pei

> -----Original Message-----
> From: vijay garla [mailto:vngarla@gmail.com]
> Sent: Monday, October 21, 2013 7:03 PM
> To: dev@ctakes.apache.org
> Subject: Re: move ytex annotators to ctakes.apache.org?
> 
> More questions:
> 
> 
> On Mon, Oct 21, 2013 at 4:26 PM, Pei Chen <chenpei@apache.org> wrote:
> 
> > Hi Vijay,
> > This is awesome.  Some ideas inline below:
> >
> > >I'm not sure how you collect all the dependencies for shipment, but
> > >how do
> > I tell maven not to include these?
> > Take a look at the distribution project [1].  It defines what gets put
> > in and out of the distro.
> >
> > > Is it OK to check weka & jdbc into source control?
> > Please do not commit the non-compatible license jars.  We will have to
> > remove thembefore it gets distributed anyway so best to avoid it.
> > However, if you would like to include it in the Jira as an
> > attachment/Sandbox initially to leverage the community's help, I can
> > also take a look at it and lend a helping hand if needed- and perhaps
> > others in the community may also be interested in helping out.
> >
> 
> These libraries can be included via maven (not checked in) and excluded
> when creating the distro - that get's around having non-compatible jars in
> source control/distro.  The main question is, how to ship these?  As part of
> the resources jar?
> 
> 
> >
> > > * desc vs <project>-res
> > The -res projects was originally designed for the models/resources.
> > So that downstream consumers do not necessary have to include huge
> > resource files if they only need the code.  So, I would suggest any
> > plain text config source files go directly into the project and it's
> > corresponding -res project.
> > [1]
> >
> > https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-distribution/src/
> > main/assembly/
> >
> > > * distribution of umls concept graphs
> > Are the contents of those concept graphs ASL 2.0 compatible?  Probably
> > will need to double check to see if it's modified/considered derived works?
> >
> 
> These should be handled identical to the UMLS hsql dictionary shipped in
> resources - the concept graphs are derived from UMLS level 0 sources +
> SNOMED CT
> 
> 
> >
> > >* patches to other ctakes projects
> > I think these would be really good!  perhaps we can even open Jira's
> > and commit those in parallel..
> >
> > >* post download setup
> > I think this is actually a good idea- to have some kind of "installer"
> > that guides the user through all the different download processes.
> > Would it be possible to clone that to see how it would look like for
> > ctakes?  I was originally thinking of groovy or some other scripts,
> > but would be curious to see especially if ytex already did something like
> that.
> >
> 
> I am just using plain vanilla ant, but am open to doing this 'the' ctakes way,
> but it doesn't sound like there is an established mechanism.  I am open to
> using any scripting language that is already included in ctakes.
> 
> 
> >
> > >* +1 for the ytext projects for the time-being.  We can also refactor
> > >them
> > into the existing projects as appropriate (once everyone has a better
> > understanding of the functionality?)
> >
> > Also, just curious- how big of a code base was this originally? I'm
> > just thinking about IP Clearance here (if it's required).
> >
> 
> ~250 java source files, ~1mb source (didn't do a line count)  the original code
> is ASF 2.0 license.
> 
> 
> >
> >
> > On Mon, Oct 21, 2013 at 8:57 AM, vijay garla <vngarla@gmail.com> wrote:
> >
> > > Hello All,
> > >
> > > I've started on the ytex-ctakes port, and have some packaging questions.
> > >
> > > * Hibernate & Weka & JDBC Driver (SQL Server, Oracle) dependencies:
> > > I understand that we will not ship these jars as part of the ctakes
> > > download.  Can we bundle the jars and ship them as part of an
> > > additional download, available via sourceforge?  Hibernate is
> > > available via maven central, weka and jdbc not.  I have added weka &
> > > jdbc drivers as system dependencies.  I'm not sure how you collect
> > > all the dependencies for shipment, but how do I tell maven not to
> > > include these?  Is it OK to
> > check
> > > weka & jdbc into source control?
> > >
> > > * desc vs <project>-res
> > > What are the guidelines for what goes where?  Configuration files
> > > are
> > found
> > > in both places, whereas data/models are in the -res directory.  Ytex
> > > has many non-uima config files (hibernate, spring) which should be
> > > user-modifiable, and I would put them in the desc directory.
> > > However,
> > desc
> > > is not in the project classpath (but it is in the classpath for the
> > ctakes
> > > distro, e.g. in runctakesCPE.bat).  Any reason for this dissonance?
> > > I would add desc as a resources directory in the pom.
> > >
> > > * distribution of umls concept graphs for semantic similarity and
> > > word sense disambiguation, ytex provides concept graphs derived from
> > > the UMLS.  We have a download site that requires UTS login to get
> > > these concept graphs (
> > > http://www.ytex-nlp.org/umls.download/secure/0.7/umls.zip).  I take
> > > it I would just create a -res directory and add the concept graphs
> > > here, and they would automagically appear in the ctakes-resources zip?
> > >
> > > * patches to other ctakes projects
> > > ytex has some patches to other ctakes annotators for handling edge
> > > cases where they throw up with an exception; I will check to see if
> > > these
> > changes
> > > have already been made.  If not, I will file separate Jira tickets
> > > for these patches.  Also, the
> > > CharacterOffsetToLineTokenConverterCtakesImpl
> > > needs to be modified to properly handle cases where newlines are in
> > > sentences; I will add a patch for that as well.
> > >
> > > * post download setup
> > > ytex provides an ant script to simplify the post download setup
> > > (database schema, setup, configuration file generation).  Would it
> > > be possible to ship ant with the ctakes distro, so that users can execute
> these scripts?
> > >  If not, how best to automate setup?  I know from experience with
> > > earlier versions of ytex that setting up the database schema is
> > > error prone, and that this needs to be automated.
> > >
> > >
> > > I was planning on creating the following projects:
> > > * ctakes-ytex:
> > > Base ytex, includes semantic similarity tools.  This has no
> > > dependencies
> > on
> > > ctakes, and I would create a separate distribution of just this
> > > package
> > for
> > > a semantic similarity distro.
> > > * ctakes-ytex-res
> > > Includes concept graphs for semantic similarity.
> > > * ctakes-ytex-web
> > > Provides User Interface, RESTful, and WebServices interface to
> > > semantic similarity service.  This has no dependencies on ctakes,
> > > and this would
> > be
> > > included in the semantic similarity distro.
> > > * ctakes-ytex-uima
> > > Includes ytex analysis engines
> > > * ctakes-ytex-uima-res
> > > resources for ytex analysis engines
> > >
> > > Alternatively, I can add ctakes-ytex-uima and ctakes-ytex-uima-res
> > > to existing projects (don't know where they would fit).
> > >
> > > Best,
> > >
> > > Vijay
> > >
> > >
> > >
> > >
> > > On Thu, Oct 3, 2013 at 7:06 PM, vijay garla <vngarla@gmail.com> wrote:
> > >
> > > > Hi Pei,
> > > >
> > > > The WSD annotator relies on the semantic similarity component,
> > > > which is a general purpose tool not strictly limited to ctakes or
> > > > NLP.  I would like to keep the semantic similarity component
> > > > 'standalone', i.e. with no dependencies on ctakes, and make it
> > > > redistributable on its own.  If that is possible as part of ctakes, I'd
love to
> move it.
> > > > If not, I'd leave the semantic similarity and the associated WSD
> > > > annotator on google code.
> > > >
> > > > For those of you who want the back story:
> > > > http://www.biomedcentral.com/1471-2105/13/261
> > > > http://jamia.bmj.com/content/20/5/882.long
> > > >
> > > >
> > > > -vj
> > > >
> > > > On Thu, Oct 3, 2013 at 5:13 PM, Chen, Pei
> > > > <Pei.Chen@childrens.harvard.edu> wrote:
> > > > > vj,
> > > > > Were you thinking of contributing the new ytext Word Sense
> > > > Disambiguation component as well- I think that will be really cool.
> > > > > --Pei
> > > > >
> > > > >> -----Original Message-----
> > > > >> From: ksarma@gmail.com [mailto:ksarma@gmail.com] On Behalf Of
> > Karthik
> > > > >> Sarma
> > > > >> Sent: Thursday, October 03, 2013 1:05 PM
> > > > >> To: dev@ctakes.apache.org
> > > > >> Subject: Re: move ytex annotators to ctakes.apache.org?
> > > > >>
> > > > >> This would be quite valuable -- in particular, ytex's
> > > > >> annotation
> > > > database
> > > > >> connection is much easier to use than what ships with cTAKES.
> > > > >> There
> > > are
> > > > a
> > > > >> fair number of other advantages, and I think they'd all be very
> > > > valuable!
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Karthik Sarma
> > > > >> UCLA Medical Scientist Training Program Class of 20??
> > > > >> Member, UCLA Medical Imaging & Informatics Lab Member, CA
> > > > >> Delegation to the House of Delegates of the American Medical
> > > > >> Association ksarma@ksarma.com
> > > > >> gchat: ksarma@gmail.com
> > > > >> linkedin: www.linkedin.com/in/ksarma
> > > > >>
> > > > >>
> > > > >> On Thu, Oct 3, 2013 at 5:50 AM, vijay garla <vngarla@gmail.com>
> > > wrote:
> > > > >>
> > > > >> > Hello All,
> > > > >> >
> > > > >> > I'd like to contribute ytex to ctakes.  YTEX's main feature
> > > > >> > is the ability to store *any* ctakes (or uima) annotation
in
> > > > >> > a relational database (in a relational format), and the
> > > > >> > ability to export these annotations to ML packages (weka,
> > > > >> > libsvm, matlab, R).  All of this
> > > is
> > > > >> > purely declarative/via configuration.
> > > > >> >
> > > > >> > In addtion, Ytex provides the following:
> > > > >> > * Negation Detection with Negex
> > > > >> > * SegmentRegexAnnotator - section detection with regular
> > expressions
> > > > >> > * NamedEntityRegexAnnotator - named entity detection with
> > > > >> > regular expressions
> > > > >> > * Sentence Splitter - modified ctakes sentence splitter
> > > > >> > making sentence split patterns configurable (not hardcoded
to
> > > > >> > \n)
> > > > >> >
> > > > >> > YTEX currently works with ctakes 2.5; I would like to upgrade
> > > > >> > it
> > to
> > > > >> > the latest ctakes, and if the community is interested,
> > > > >> > contribute
> > to
> > > > >> > ctakes.apache.org.
> > > > >> >
> > > > >> > A licensing question: YTEX uses Spring (apache 2.0 license),
> > > Hibernate
> > > > >> > (lgpl 2.1), & weka (gpl).  Are there any issues with
> > > > >> > including
> > > these?
> > > > >> >
> > > > >> > Cheers
> > > > >> >
> > > > >> > vj
> > > > >> >
> > > >
> > >
> >

Mime
View raw message